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EXECUTIVE  SUMMARY 


This  report  describes  the  second  and  final  phase  of  a  research  program 
aimed  at  the  utilization  of  multicomponent  analysis  by  non-fragmenting  mass 
spectrometry  for  diagnosis  of  infectious  diseases.  During  this  phase  it  has 
been  demonstrated  that  metabolic  profiles  of  the  host  can  be  used  to  identify 
infected  subjects,  to  differentiate  between  patients  with  different 
infections,  to  detect  a  viral  infection  prior  to  the  onset  of  clinical 
symptoms,  and  to  demonstrate  the  existence  of  a  viral  infection  some  time 
after  the  clinical  symptoms  have  subsided.  Further,  it  has  been  demonstrated 
that  mass  spectrometric  multicomponent  analysis  can  be  a  highly  useful  tool  in 
the  clyiical  laboratory  by  detecting  the  presence  of  a  virus  through  its 
effect  on  the  metabolism  of  tissue  cultured  cells.  A  diagnostic  biochemical 
pattern  seems  to  be  distinguishable  within  a  few  hours,  which  is  significantly 
faster  than  by  the  presently  used  morphological  changes.  Also,  new  sample 
preparation  techniques,  instrumentation,  and  computerized  statistical  analysis 
have  been  developed  during  this  phase  of  the  program. 
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I.  INTRODUCTION 

This  is  the  third  annual  and  final  report  on  the  second  phase  of  a  program 
on  "Mass  Spectrometric  Rapid  Diagnosis  of  Infectious  Diseases"  conducted  under 
contract  DAMD  177808035  with  the  U.S.  Army  Medical  Research  and  Development 
Command.  This  report  covers  the  period  February  1,  1980  through  February  28, 
1981  and  concludes  the  research  effort  under  this  contract. 

The  purpose  of  the  program  has  been  to  develop  a  methodology  for  the  rapid 
diagnosis  of  infectious  diseases  based  on  non-fragmenting  mass  spectrometry. 
The  experimental  approach  is  based  on  a  novel  methodology  and  instrumentation 
developed  by  the  Principal  Investigator  at  Standford  Research  Institute 
(SRI).  This  methodology  has  been  shown  to  facilitate  the  detection  of 
specific  metabolic  aberrations  that  occur  in  the  host  as  a  result  of  an 
infectious  process.  It  might  also  be  applied  to  the  detection  of  viruses  in 
tissue  cultures  and  to  the  identification  of  microorganisms  by  their  chemical 
constituents  or  by  their  characteristic  metabolic  products. 

The  capability  of  making  a  rapid  and  reliable  diagnosis  of  infectious 
diseases  at  an  early  stage  and  at  low  cost  would  be  of  especially  great  value 
to  the  military  where  large  numbers  of  army  personnel  are  stationed  in 
confined  areas  and  their  continuing  health  is  crucial  to  carrying  out  their 
objectives.  Early  and  reliable  diagnosis  of  an  infectious  disease  could 
prevent  the  spread  of  disease  to  larger  groups  of  military  and  civilian 
personnel. 

Viral  infections  are  generally  harder  to  diagnose  than  bacterial  diseases 
and  in  many  cases  it  is  even  difficult  to  establish  whether  a  given  set  of 
pathophysiological  clinical  symptoms  are  associated  with  a  viral  infection. 
Further,  the  identification  of  presence  of  a  virus  and  the  characterization  of 
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a  detected  virus  has  to  be  carried  out  in  tissue  cultures  in  assays  that  may 
take  over  24  hours,  or  even  48  hours,  to  complete.  The  methodology  developed 
in  this  program  of  research  is  directed  (1)  at  establishing  if  a  viral 
infection  in  humans  lead  to  a  specific  host  biochemical  reaction  expressed  in 
a  characteristic  change  in  the  metabolic  profile  of  urine,  and  (2)  at 
establishing  the  presence  and  possibly  the  identity  of  viruses  in  vitro  by 
monitoring  early  changes  in  the  metabolic  profiles  of  tissue  culture  media. 

We  have  reason  to  expect  the  detection  of  characteristic  biochemical  changes 
in  tissue  culture  media  before  any  morphological  changes  can  be  detected. 

During  the  second  phase  of  this  program,  we  have  achieved  a  number  of 
critical  objectives.  After  further  improvements  and  optimization  of  the 
sample  preparation  techniques  and  after  finding  optimal  conditions  for  mass 
spectrometric  analysis  (utilizing  a  double  focussing  configuration),  we  have 
devoted  a  substantial  effort  to  compare  and  critically  analyze  alternative 
statistical  multivariate  diagnostic  procedures.  Further,  a  substantial  part 
of  the  effort  has  been  devoted  to  the  analysis  of  clinical  samples  and  thus  to 
the  evaluation  of  the  potential  usefulness  of  our  methodology  as  a  clinical 
diagnostic  technique.  We  have  demonstrated  a  high  degree  of  success  in 
separating  a  number  of  different  groups  of  patients  by  the  metabolic  profiles 
of  their  urines.  These  included  patients  with  alcoholic  liver  disease, 
children  with  pneumonia  and  children  with  virus  induced  diarrhea;  the  patients 
could  be  readily  differentiated  by  comparison  with  healthy  subjects  of  the 
corresponding  age  groups  with  practically  no  false-positives  or 
false-negatives.  Next  we  have  carried  out  a  longitudinal  study  on  urine 
samples  obtained  from  groups  of  volunteer  subjects  vaccinated  with  live  virus 
of  sandfly  fever  and  dengue  fever,  who  were  followed  up  for  a  number  of 
weeks.  This  study  suggests  that  a  diagnostic  pattern  associated  with  the 
infection  can  be  detected  before  the  onset  of  clinical  symptoms  and  it 
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subsides  after  their  disappearance  (Clin.  Chem.  29,  1443  (1980)).  Also, 
significant  differences  in  the  rate  of  individual  reaction  to  the  infection 
could  be  observed. 

In  another  series  of  studies  we  have  shown  that  the  molecular  weight 
profile  of  a  human  and  animal  tissue  culture  medium  exhibits  significant 
changes  when  the  cells  are  infected  with  a  virus,  within  hours  following 
infection.  l-&jman  lung  tissue  cultures  infected  with  polio  myelitis 
demonstrated  a  diagnostically  useful  pattern  6  hours  following  exposure  to  the 
virus  (Clin.  Chem.  26,  1443  (1980)).  These  feasibility  studies  should  now  be 
continued  to  establish  the  scope  and  limitations  of  this  early  defection 
technique. 

In  brief,  during  the  second  phase  of  this  program,  preliminary  experiments 
have  shown  that  metabolic  profiles  of  the  host  can  be  used  to  identify 
infected  subjects,  to  differentiate  between  patients  with  different 
infections,  to  detect  a  viral  infection  prior  to  the  onset  of  clinical 
symptoms,  and  to  demonstrate  the  existence  of  a  viral  infection  some  time 
after  the  clinical  symptoms  have  subsided.  Further,  it  has  been  demonstrated 
that  mass  spectometric  multicomponent  analysis  can  be  a  highly  useful  tool  in 
the  clinical  laboratory  by  detecting  the  presence  of  a  virus  through  its 
effect  on  the  metabolism  of  tissue  cultured  cells.  A  diagnostic  biochemical 
pattern  seems  to  be  distinguishable  within  a  few  hours,  which  is  significantly 
faster  than  by  the  presently  used  morphological  changes. 
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II.  SUMMARY  OF  ACCOMPLI 5 WENTS 

The  objective  of  the  proposed  program  of  research  has  been  to  answer  the 
following  questions: 

1.  Do  infectious  diseases  exhibit  characteristic  concentration  patterns 
of  metabolites  in  urine? 

2.  Can  a  rapid  and  reliable  pattern  recognition  analytical  technique, 
capable  of  providing  such  diagnostic  information  based  on 
non-fragmenting  mass  spectrometry,  be  developed? 

3.  Can  the  technique  make  the  diagnosis  with  sufficient  sensitivity 
(small  percentage  of  false  negatives)  while  being  highly  specific 
(virtually  nil  false  positives)? 

A.  To  what  extent  are  the  changes  in  the  chemical  construction  of  these 
biological  fluids  indicative  of  the  severity  of  viral  infection? 

5.  To  what  degree  can  the  chemical  aberrations  in  urine  be  used  as 
indicators  for  recovery,  and  to  what  degree  can  they  be  useful  for 
the  identification  of  post-infection  carriers? 

6.  Can  viruses  be  detected  through  biochemical  changes  induced  in 
cultured  cells? 

7.  Can  viruses  be  identified  through  characteristic  metabolites  released 
from  tissue  cultured  cells  into  controlled  artificial  media,  within  a 
few  hours  of  incubation? 

8.  Can  the  chemical  identity  of  the  metabolites  constituting  a  given 
diagnostic  pattern  be  determined,  to  allow  their  determination  by 
analytical  techniques  other  than  mass  spectrometry. 


The  first  three  questions  have  been  answered  affirmatively  during  the 
previous  phases  of  the  program  (Clin.  Chem.  22,  1503  (1976)).  The  results 
obtained  recently  corroborate  these  conclusions  in  an  unambiguous  manner. 
Questions  4,  5  and  6  are  still  open.  However,  the  preliminary  results 
obtained  by  the  longitudinal  studies  last  year  are  very  encouraging  - 
demonstrating  a  diagnostically  significant  biochemical  pattern  in  urine  prior 
to  the  onset  of  clinical  symptoms  of  a  viral  infection  -  a  pattern  that  may 
persist  for  some  time  after  the  clinical  symptoms  have  subsided.  The 
preliminary  answer  to  question  6  is  also  positive,  although  much  more  work  is 
required  to  substantiate  these  findings.  The  answers  to  questions  7  and  8  are 
still  pending  and  will  be  addressed  in  the  proposed  next  phase  of  the  program. 

In  the  second  phase  of  this  program  we  proposed  to  accomplish  the 
following: 

1.  Substantiate  the  findings  on  the  rapid  differential  diagnosis  of 
hepatic,  pulmonary,  and  urinary  infections. 

2.  Carry  out  longitudinal  study  on  patients  suffering  from  viral 
infections  from  the  onset  of  the  disease  through  stages  of  recovery 
and  reconvalescence. 

3.  Improve  the  statistical  data  handling  techniques  to  cope  better  and 
faster  with  the  diagnostic  problems,  and,  in  particular,  develop 
pattern  recognition  techniques  capable  of  distinguishing  the  various 
biochemical  factors  detected  in  subjects  with  infectious  diseases. 

Each  of  these  tests  has  been  accomplished  as  described  in  part  III  of  this 
report.  We  have  substantiated  the  preliminary  findings  that  characteristic 
metabolic  profiles  detectable  by  mass  spectrometry  are  associated  with  various 
infections.  We  have  shown  the  developmer  *■  in  healthy  individuals  of  a 
characteristic  pattern  followifiH  a-  fee  .on  preceding  the  onset  of  clinical 
symptoms.  Moreover,  this  characteristic  metabolic  profile  seems  to  persist 


through  the  stage  of  recovery  (Clin.  Chem.  26,  1443  (1980)).  We  have  made 
significant  progress  in  understanding  and  adapting  alternative  diagnostic 
statistical  classification  techniques.  We  have  also  tested  the  possibility  of 
early  detection  of  the  presence  of  a  virus  in  a  tissue  culture  by  the 
monitoring  of  the  medium  and  the  results  are  highly  encouraging,  so  that  we 
plan  to  devote  to  this  problem  a  substantial  part  of  the  next  phase's  effort. 

In  the  second  phase  of  this  program,  we  were  interested  primarily  in 
establishing  the  feasibility  of  the  diagnostic  pattern  recognition  technique, 
both  in  vivo  and  in  vitro.  We  did  not,  however,  study  in  detail  each  of  the 
parameters  which  determine  the  quality  of-  the  metabolic  profiles  in  order  to 
find  optimal  conditions  of  speed  of  analysis,  including  data  handling, 
accuracy  and  precision,  and  intersample  memory  effects.  This  is  true  of  all 
the  stages  of  the  analytical  procedures  from  the  stage  of  collection  of  the 
biological  fluid,  its  short  term  storage,  its  long  term  storage,  the 
pre-treatment  of  samples  in  preparation  for  mass  spectrometry  and  then  the 
mass  spectrometry  itself  and  its  computerized  data  processing.  Such  an 
optif”'  ration  will  not  only  streamline  and  increase  the  throughput  of  the 
analytical  laboratory  but  it  will  also  further  improve  the  sensitivity  and 
specificity  of  the  technique  and  make  it  applicable  to  monitor  more  subtle 
pathological  changes. 
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III.  TECHNICAL  BACKGROUND 
A.  INTRODUCTION 

During  this  second  phase  we  have  accomplished  the  following  tasks: 

1.  Developed  and  tested  sample  sterilization  and  storage  procedures. 

2.  Developed  new,  simpler  sample  preparation  techniques,  including  one 
to  handle  tissue  culture  media. 

3.  Improved  the  mass  spectrometric  analysis  procedure. 

4.  Studied  and  compared  different  ionization  procedures  and  sample 
handling  techniques. 

5.  Improved  the  interfacing  with  the  INCOS-NOVA  and  the  CYBER  173 
computers. 

6.  Tested  and  evaluated  different  statistical  analysis  classification 
procedures. 

7.  Analysed  samples  of  children  and  adults  with  different  pathological 
problems  and  demonstrated  the  efficiency  of  our  diagnostic  procedure. 

8.  Analysed  samples  of  series  of  samples  obtained  from  virus  infected 
volunteers  over  a  period  of  4  weeks,  and  demonstrated  the  appearance 
of  a  pathological  pattern  which  disappeared  after  the  disappearance 
of  the  clinical  symptoms.  Different  sample  pre-treatment  procedures 
and  ionization  methods  have  been  tested  in  this  study. 

9.  Analysed  media  of  human  and  animal  tissue  cultures  infected  with 
polio  virus  and  demonstrated  a  diagnostically  useful  pattern  within 
hours  following  exposure  to  the  virus. 

Results  of  tasks  8  and  9  have  been  published  recently  in  Clinical 
Chemistry  (Appendix  A).  However,  this  technical  report  includes  more  recent 
experimental  results. 
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B.  EXPERIMENTAL  TECHNIQUES  AND  PROCEDURES 

The  following  is  an  updated  description  of  our  experimental  techniques  and 
procedures.  Although  some  of  the  material  described  below  was  included  in  the 
previous  annual  reports  significant  changes  have  been  introduced  since  that 
time  in  a  number  of  phases  in  the  analytical  procedures  to  warrant  reporting 
anew  the  whole  analytical  procedure. 

1.  Sample  Collection  and  Storage  Procedure 

Urine  samples  were  collected  by  hospital  staff  in  12  ml  plastic  collection 

TM 

tubes  with  self-sealing  caps  (Kova-Tubes  )  and  kept  frozen  until  analysis. 

As  an  enzyme  denaturant,  bacteriocide  and  potential  virus  inactivator,  0.1  ml 
of  0.5  M  ZnSO^  or  HgC^  was  placed  in  tubes  given  to  the  hospital  or 
clinic.  This  results  in  a  final  concentration  of  approximately  0.005  M 
?nS04  in  the  collected  urine. 

After  reaching  room  temperature,  an  equimolar  amount  of  EDTA  is  added  to 
chelate  the  excess  zinc  or  mercury  ions,  and  the  pH  of  the  sample  is  adjusted 
to  7.2  with  HC1  or  NaOH.  Chelation  of  the  excess  zinc  with  EDTA,  avoids 
precipitation  of  Zn^H)^  during  pH  adjustments.  Such  precipitation  could 
potentially  alter  profile  patterns  by  co-precipitating  organic  constituents  in 
the  urine. 

Samples  (5  ml  aliquots)  were  thawed  and  0.005M  ZnSO^  was  added  to 
samples  that  were  collected  without  this  preservative.  The  presence  of  zinc 
was  found  to  have  a  minimal  but  detectable  effect  on  the  molecular  weight 
profile,  so  all  samples  are  presently  processed  with  this  reagent  added.  In 
the  future  Hg"^  will  be  substituted  for  ZnSO^  in  this  protocol  to 
eliminate  the  possibility  of  a  viable  viral  contamination  (see  below). 
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Drbplets  of  thawed  urine  are  also  tested  using  Ames  Multistix  reagent 
strips  to  assay  pH,  protein,  blood,  glucose,  ketone  bodies,  bilirubin  and 
urobilinogen.  These  results  are  recorded  for  correlation  with  hospital 
testing  of  the  same  sample,  and  to  identify  samples  that  could  be  excluded 
from  data  sets  on  the  basis  of  abnormal  kidney  function. 

In  conjunction  with  the  in  vitro  tissue  culture  experiments  both  ZnSO^ 
and  HgCl^  were  tested  by  Dr.  Howard  Faden  of  the  Childrens  Hospital  for 
their  ability  to  inactivate  polio  virus  (Mahoney  polio  strain).  The  viral 
solution  was  supplemented  with  one  <r  the  other  of  the  salts  and  then  treated 
with  EDTA  to  remove  excess  metal  ions  that  were  themselves  cytotoxic.  Viral 
solution  treated  with  5  mM  HgCl^  did  not  cause  infection  when  added  to  a 
culture  of  human  embryonic  lung  tissue.  On  the  other  hand,  concentrations  of 
up  to  10  millimolar  ZnSO^  were  not  effective  in  inactivating  this  virus. 

The  latter  findings  indicate  the  preference  of  HgCl ^  as  a  sterilizing 
additive. 

A  titration  curve  of  urine  indicated  that  pH  values  less  than  2,  near  7.2 
and  above  10  are  likely  to  yield  reproducible  profile  patterns.  In 
intermediate  pH  regions  the  extraction  yields  of  partially  ionized 
constituents  are  highly  pH  dependent.  In  a  separate  series  of  experiments, 
using  the  Wilcoxon-WNI  test  to  evaluate  the  results,  it  was  determined  that  in 
urine  extracts  insignificant  pattern  changes  occurred  within  a  pH  range  of  + 
0.5  pH  unit  of  7.2. 

We  have  studied  the  effects  of  various  means  of  sample  storage.  This 

study  is  outlined  in  Figure  1.  A  volume  of  urine  was  collected  from  a  healthy 

adult  male.  This  volume  was  divided  into  5  ml  aliquots  that  were 

al ternati vely  refrigerated,  frozen  and  left  standing  at  room  temperature 

TM 

without  preservati ve,  in  capped,  12  ml  plastic  Kova  Tubes  .  Three  samples 
were  prepared  immediately  from  the  fresh  urine.  Samples  stored  under  the 
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conditions  outlined  in  Figure  1  (refrigerated  at  +5°C  and  -20°C 
respectively)  were  subsequently  prepared  in  triplicate  for  mass  spectrometric 
analysis. 

Although  in  our  short  term  study  no  significant  changes  were  observed  in 
the  mass  spectrometric  profiles  of  samples  refrigerated  up  to  6  days  or  frozen 
up  to  34  days,  later  experience  gained  from  urine  sample  stored  at  -20°C  for 
3  to  12  months  indicate  substantial  changes.  As  consequence  we  have  recently 
changed  our  storage  procedures  to  use  a  deep  freeze  at  -76°C. 

Two  different  methods  were  employed  to  defrost  the  frozen  samples.  At 
each  time  indicated,  three  tubes  were  allowed  to  defrost  by  standing  at  room 
temperature  for  approximately  45  minutes.  Three  other  tubes  were  more  rapidly 
thawed  by  holding  them  under  running  water.  No  significant  effect  of  the  rate 
of  thawing  was  detected. 
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2.  Sample  Preparation  Procedure 

Samples  are  applied  to  the  top  of  a  6  mm  O.D.  x  25  cm  glass  column 
containing  18  cm  of  prewashed  Chromosorb  P  above  2.5  cm  of  and  a 

glass  wool  plug.  The  column  outlet,  a  10  cm  length  of  0.25  mm  I.D.  stainless 
steel  capillary  tubing,  and  a  solvent  reservoir  of  12  cm  O.D.  x  100  cm  glass 
tubing  attached  to  the  top  of  the  column,  are  assembled  using  stainless  steel 
swagelok™  unions  and  teflon™  ferrules  (see  Fig.  2).  The  sample  is 
applied  to  the  column  using  nitrogen  pressure  supplied  through  a  swagelok™ 
fitting  at  the  top  of  the  column.  The  NagSO^  serves  to  retain  any  water 
that  may  emerge  from  the  chromosorb.  After  the  sample  is  adsorbed,  5  mis  of 
dichloromethane  are  added  to  the  reservoir  and  forced  through  the  Chromosorb 
column  with  pressure  at  a  flow  rate  of  0.5  ml/min.  The  emerging  eluate 
is  continuously  absorbed  onto  a  1  mm  x  2  cm  strip  of  Whatman  GP/A  glass  fiber 
filter  paper  at  the  bottom  of  a  conical  tube.  The  eluate  on  the  paper  is 
continuously  concentrated  by  a  stream  of  dry  directed  toward  the  bottom 
of  the  collection  tube.  The  dried  eluate  is  stored  in  a  glass  vial  until  mass 
spectrometric  analysis. 

The  filter  paper  replaces  a  micro  column  of  chromosorb  previously  used 
(see  1979 ' s  report).  The  glass  wool  plugs  of  those  sample  columns  frequently 
loosened  and  resulted  in  sample  loss  during  loading  into  the  mass  spectrometer 
inlet  probe.  The  glass  fiber  paper  has  a  lower  background  than  the  chromosorb 
column  and  gives  evaporation  profiles  similar  to  those  obtained  with  the  micro 
columns. 

Samples  for  chemical  ionization  mass  spectrometry  were  prepared  by 
incubating  1  ml  aliquots  of  urine  with  0.5  ml  of  phosphate  buffer  (pH  6.8)  and 
0.5  ml  urease  solution  (371  units/ml)  for  one  hour  at  room  temperature. 


FIGURE  2 
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Following  digestion  of  urea  the  pH  was  adjusted  to  pH  7  with  IN  HC1.  An 
aliquot  of  sample  equivalent  to  1  to  5  microliters  of  the  undiluted  urine  was 
dried  on  a  2  mm  x  1  cm  strip  of  glass  filter  paper.  Samples  were  allowed  to 
dry  in  air  at  room  temperature  and  were  stored  for  periods  of  one  day  or  less 


at  room  temperature  until  analysis. 

3.  Computerized  Data  Acquisition 

We  have  successfully  interfaced  each  of  our  spectrometers  to  the  FINNIGAN 
2400  data  system.  Recently  we  have  acquired  and  installed  a  dual  acquisition 
system  that  allows  simultaneous  operation  and  data  gathering  on  two  mass 
spectrometers.  The  data  system  generates  a  digital  scan  function  with 
selectable  scan  times  and  upper  and  lower  mass  limits.  This  digital  scan 
function  is  applied  to  a  16-bit  D/A  converter  to  provide  as  analog  signal  to 
drive  the  spectrometer  magnet.  For  use  with  the  double  focusing  CID 
instrument  this  analog  signal  is  applied  to  a  current  programmable  magnet 
power  supply  (Alpha  Scientific  Model  3048)  using  an  Analog  Device  model  AD284J 
isolation  amplifier.  This  configuration,  comprising  a  computer  generated 
digital  scan,  D/A  converter  and  current  programmable  supply,  gives  highly 
reproducible  scans  on  each  instrument  to  which  the  computer  is  interfaced. 

For  field  ionization  operation,  the  accelerating  voltage  is  monitored  with 
a  4  1/2  digit  digital  voltmeter  and  maintained  at  the  same  value  (nominal 
5000V)  at  the  beginning  of  each  sample  run  to  provide  the  same  mass-to-time 
function  for  each  analysis.  The  detector  output  is  sampled  by  a  12  bit  A/D 
;  converter  with  computer  controlled  integration  times  of  25  to  200  microseconds 

(see  Fig.  3). 

Mass  assignment  is  achieved  by  a  time-to-mass  calibration  curve 
established  with  a  known  mixture  of  reference  materials.  The  calibration 
algorithm  utilizes  a  higher  order  curve  fitting,  capable  of  accurate 
interpolation  and  extrapolation  for  mass  assignment,  provided  the  spectrometer 
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FIGURE  3 


MASS  SPECTROMETER-DATA  SYSTEM  BLOCK  DIAGRAM 


scan  function  is  reproducible.  In  our  system  the  digital  scan  function  and 
the  stable  current  programmed  power  supply  gives  mass  assignment  stability 

within  0.3  amu  from  sample  to  sample.  Mass  assignment  drifts  of  less  than  1 

amu  are  generally  maintained  over  several  days. 

In  spite  of  the  inherent  mass  assignment  reproducibility  we  recalibrate 
the  instrument  for  each  sample.  Calibration  for  field  ionization  is  more 
difficult  than  with  electron  impact.  Due  to  the  absence  of  fragment  ions,  we 
must  calibrate  the  mass  range  with  a  mixture  of  compounds  with  similar  vapor 
pressures,  in  order  to  obtain  spectra  with  all  calibration  peaks  present  in  a 
single  scan.  Presently  •;a  use  a  mixture  of  seven  compounds  (see  Figure  4) 
covering  a  molecular  weight  range  for  m/e  73  to  m/e  298.  Using  an  instrument 
with  a  nominal  resolution  of  500,  the  calibration  with  this  mixture  is 
generally  accurate  to  0.05  amu  at  mass  300.  Assignment  of  masses  outside  the 
73-298  amu  range  is  by  extrapolation  of  the  computed  calibration  curve. 

Extrapolated  assignment  of  masses  up  to  100  amu  above  m/e  298  is  accurate  and 

stable  to  +  0.2  amu.  This  is  measured  by  observing  the  assignment  of  a  high 
M.W.  peak  (e.g.  cholesterol  MW  386)  using  several  different  calibration 
curves. 

For  analysis  of  urine  extracts  the  mass  spectrometer  is  scanned  from  1  to 
450  amu  in  12  seconds.  The  total  scan  cycle  time  is  17  seconds  including  the 
time  spent  on  returning  to  the  starting  mass  and  allowing  for  field 
stabilization.  Samples  are  placed  in  the  solid  probe  which  is  placed  in 
contact  with  the  ion  source.  Data  acquisition  starts  as  soon  as  the  sample  is 
in  position  near  the  ion  source.  The  sample  is  maintained  at  20°C  using  air 
cooling  of  the  probe  during  sample  introduction,  and  following  initial  contact 
with  the  ion  source  which  is  maintained  at  200 °C.  The  probe  temperature  is 
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linearly  programmed  for  20°C  to  200°C  increase  over  20  min,  and  then  held 
at  200°C  for  5  minutes. 

The  data  system  records  and  stores  90  individual  mass  scans  during  the 
sample  volatilization.  These  scans  may  be  displayed  in  graphical  or  in 
tabular  format  during  acquisition.  The  data  is  stored  in  an  accurate  mass 
format  with  the  mass  of  each  peak  assigned  to  within  +  60  ppm  amu  by  the 
current  calibration  file.  At  the  end  of  the  acquisition  period,  the 
individual  scans  are  summed  following  conversion  of  the  accurate  masses  to 
nominal  masses,  and  this  integrated  spectrum  is  used  in  all  subsequent  data 
processing  steps.  Immediately  following  the  urine  sample,  a  new  calibration 
sample  is  introduced  and  10  scans  acquired  over  a  temperature  increase  from 
room  temperature  to  approximately  70°C.  Due  to  the  high  volatility  of  some 
components  in  the  low  molecular  weight  range,  large  changes  in  the  relative 
peak  intensities  occur  during  the  calibration  run,  and  only  the  scans  which 
have  sufficient  intensity  of  all  components  of  the  mixture  are  utilized  for 
calibration. 

The  drift  in  mass  assignment  of  one  of  the  components,  methyl  sterate,  MW 
298.28,  is  compared  between  this  and  the  previous  calibration  file.  This  peak 
has  the  smallest  intensity  and  the  highest  mass  in  the  mixture,  so  that  its 
mass  assignment  is  subject  to  the  largest  variation  due  to  ion  statistics  and 
scan  irreproducibi 1 ity.  Differences  in  mass  assignment  for  two  consecutive 
calibration  runs  are  typically  within  0.1  amu. 

After  calibration  the  source  and  probe  are  baked  out  to  remove  any 
residue,  in  preparation  for  the  next  sample.  The  entire  sequence  of 
calibration,  bakeout,  and  sample  analysis  requires  about  one  hour.  At  least 
once  each  day  the  sensitivity  of  the  ion  source  is  measured  by  introducing  1 
microgram  of  adenine,  and  acquiring  data  with  the  same  scan  sequence  used  for 
urine  samples.  The  integrated  signal  for  this  sample  is  required  to  be 
greater  than  5000  ions.  Correcting  for-  the  percentage  of  scan  time  actually 
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spent  monitoring  the  molecular  ion,  (10  ms/17  sec)  this  is  equivalent  to 
-1 2 

10  Coulombs/microgram.  Over  the  actual  scan  range  employed  this  implies 
a  minimum  of  100  ions  collected  for  20  ng  of  a  single  component  in  an  actual 
sample  mixture.  Examples  of  calibration  and  sample  data  are  given  in  Figs. 
5-9. 

A  standard  urine  extract  is  also  analyzed  at  least  once  a  week  as  an 
overall  test  of  instrument  performance.  This  sample  is  a  10  microliter 
aliquot  of  a  of  a  concentrate  obtained  from  the  extraction  of  20  ml  of  urine 
using  a  larger  column  appropriately  scaled  up  in  size  to  cope  with  a  bulk 
sample. 

Initially,  profiles  were  recorded  using  a  single  focusing  magnetic  sector 
spectrometer  (see  1979's  report).  In  certain  samples,  particularly  from 
patients  with  liver  disorders,  we  encountered  mass  regions  where  ions  appeared 
with  a  continuum  of  energies  covering  several  amu  of  the  mass  scale.  Further 
analysis  revealed  that  many  of  these  ions  possessed  energy-to-charge  ratios 
greater  than  the  main  beam  energy.  These  ions  are  a  result  of  the  formation 
of  doubly  charged  species,  which  upon  collision  with  neutral  molecules  either 
gain  an  electron  or  decompose  to  smaller  fragments.  In  our  instrument  the 
most  likely  place  for  such  collisions  to  occur  is  in  the  lens  between  the 
source  and  the  object  slit.  This  region  has  the  highest  density  of  neutral 
molecules  as  well  as  of  focused  deccelerated  ions.  The  distribution  of 
energies  arises  from  the  fact  that  these  processes  may  occur  anywhere  between 
the  ionizer  and  the  object  slit,  in  a  region  with  a  potential  gradient.  The 
final  energy  of  the  detached  ion  will  be  a  function  of  the  position  where  the 
transition  from  doubly  charged  to  singly  charged  species  occurred.  The 
acceleration  of  the  ion  as  a  doubly  charged  species,  even  over  a  small 
fract'  .  of  the  net  accelerating  voltage,  will  result  in  an  ion  with  a  final 
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energy  greater  than  the  main  beam  energy.  In  order  to  obtain  unit  mass 
resolution  of  the  singly  charged  normal  molecular  ions,  a  double  focussing 
instrument  employing  an  energy  analyzer  is  required.  Since  the  extent  of 
these  processes  is  variable  and  may  occur  in  any  sample,  we  now  analyze  all  FI 
samples  on  the  reverse  geometry  double  focussing  instrument  (the  CID 
instrument)  described  in  the  1979  report. 

Chemical  ionization  mass  spectra  were  obtained  on  a  DuPont  21-491B  mass 
spectrometer  equipped  with  a  chemical  ionization  source  using  isobutane 
reagent  bas  at  a  source  pressure  of  0.3  to  0.5  torr.  Mass  resolution  of  500 
and  a  10  sec  cyclic  scan  from  1  to  550  amu  gives  integrated  counts  of  1.5  to  2 

5 

x  10  for  1  microgram  of  adenine  evaporated  from  the  probe,  corresponding  to 
3  x  10~^  coulombs/microgram  sensitivity.  For  analysis  of  urease  digested 
urine  samples  the  solid  probe  input  power  is  programmed  to  evaporate  the 
entire  sample  within  45  scans  in  7.5  minutes.  The  Finnigan  model  2400  data 
system  is  used  also  with  this  instrument  to  acquire  and  store  spectra. 

4.  Data  Reduction  and  Transmission 

The  mass  spectra  acquired  are  processed  locally  to  facilitate  transmission 
to  the  University's  main  computer  (Control  Data  Corporation  CYBER  173).  A 
second  set  of  operations  in  the  CYBER  then  follows,  and  results  in  spectral 
data  suitable  for  the  subsequent  statistical  operations. 

Within  our  dedicated  Finnigan/INCOS  computer  system  the  individual  mass 
spectrometer  scans  are  stored  as  acquired.  Upon  completion  of  the  run  a 
single  spectrum  is  produced  by  addition  of  all  scans.  This  spectrum  is  then 
translated  into  a  Fortran-readable  format  suitable  for  transmission  to  the 
CYBER  main  computer  over  a  1200  baud  multiplexor  phone  link. 
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When  the  spectra  are  on  file  in  the  CYBER,  format  and  simple  logical 
checks  are  made  on  the  data,  which  are  then  corrected  as  required.  Each 
spectrum  is  then  normalized  to  unit  total  area,  with  individual  peak  areas 
greater  than  5  per  cent  of  the  total  excluded  from  the  normalization  sum  (for 
the  rationale  see  1979  report).  All  statistical  analysis  is  then  accomplished 
using  these  normalized  spectra. 

C.  DIAGNOSTIC  STATISTICAL  ANALYSIS. 

1 .  Introduction 

During  this  second  phase  we  have  devoted  a  substantial  effort  to  evaluate 
our  diagnostic  statistical  analysis  comparing  a  number  of  alternative 
techniques  for  the  selection  of  variables  and  for  the  separation  of  cases  into 
diagnostically  meaningful  groups. 

The  statistical  analysis  in  this  project  has  the  following  objectives: 

a.  To  separate  the  biological  samples  (cases)  into  statistically 
distinct  groups  correlated  with  the  disorder  of  interest  (e.g. 
healthy  vs.  diseases,  healthy  vs.  alcoholic  liver  disease,  pneumonia 
vs.  bronchitis,  bacterial  vs.  viral  pneumonia,  etc.)  by  a 
characteristic  set  of  variables. 

b.  To  assign  correctly  an  unknown  case  to  one  of  a  number  of 
pre-specif ied  groups,  which  have  been  previously  developed  on  the 
basis  of  a  continuously  increasing  learning  set. 

c.  To  identify  the  variables  that  best  characterize  a  given  pathological 
state,  in  order  to  facilitate  the  understanding  of  the  biochemical 
nature  of  the  disease  and  possibly  also  to  explore  the  possibility  of 
quantitative  assay  of  the  particular  metabolite  by  a  simpler  non-mass 
spectrometric  analytical  technique. 


d.  To  identify  variables  that  co-vary,  for  two  reasons  -  first  to  attain 
a  better  understanding  of  the  biochemical  nature  of  the  disease,  and 
second  to  minimize  undue  weighting  bias  in  the  differentiation  of 
cases  into  diagnostic  groups. 

Different  multivariate  analysis  techniques  have  provided  us  with  answers 
to  these  questions  with  different  degrees  of  efficiency.  As  stated  above,  we 
have  started  to  compare  different  statistical  techniques  for  their  merits  in 
meeting  our  objectives.  In  the  following  sections  we  shall  discuss  the 
current  status  of  this  evaluation,  which  will  be  continued  during  the 
forthcoming  stages  of  this  project. 

We  may  separate  our  current  statistical  analysis  into  two  phases  -  the 
selection  and  rating  of  variables  (mass  spectral  peaks)  according  to  their 
diagnostic  value,  and  the  classification  of  cases  (patient*  urine  or  tissue 
culture  media)  into  groups. 

2.  Selection  and  Rating  of  Variables 

Each  normalized  mass  spectrum  comprises  hundreds  of  variables  (peaks)  each 
of  which  represents  the  concentration  of  a  metabolite  (or  a  group  of 
metabolites  sharing  the  same  nominal  mass)  in  the  biological  sample.  When  we 
compare  the  magnitudes  of  a  given  variable  in  samples  coming  from  two 
biochemically  distinct  groups  we  observe  three  types  of  variation: 

-  variation  due  to  the  analytical  procedure 

-  variation  due  to  biological  variance  (due  to  genetic  or  nutritional 
factors),  and 

-  variation  associated  with  the  experimental  difference  between  the 
groups,  e.g.  variation  due  to  the  pathological  status  of  a  human 
subject,  or  the  infected  status  of  a  tissue  culture. 
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Ideally  there  should  be  no  variation  of  the  first  kind  and  the  variation 
due  to  the  pathological  status  should  be  by  far  larger  than  the  biological 
variation.  In  reality,  however,  the  majority  of  variables  show  a  large 
biological  variation  (on  top  of  a  finite  experimental  variance)  and  are  thus 
of  minimal  diagnostic  value  (we  shall  call  these  diagnostically  useless 
variables).  Our  problem  is,  therefore,  to  select  those  variables  which  may  be 
diagnostically  useful. 

Any  statistical  diagnostic  procedure  will  become  ineffective  if  given  an 
excessive  number  of  diagnostically  useless  variables,  even  in  the  presence  of 
many  useful  ones.  On  the  other  hand,  we  would  like  to  utilize  every  useful 
variable  since  each  of  these  increases  the  diagnostic  power  of  the 
classification  procedure.  An  acceptable  variable  selecting  procedure  has, 
therefore,  to  identify  and  reject  useless  variables,  while  retaining  all  the 
useful  ones.  Moreover  since  there  will  always  be  variables  more  "useful"  than 
others,  an  adequate  procedure  should  rate  them  accordingly,  and  thus  allow  us 
to  optimize  the  diagnostic  power  of  the  diagnostic  procedure. 

There  is  another  factor  that  should  be  taken  into  account  -  covariance. 

In  a  biological  system  there  are  many  variables  that  are  biochemically 
interrelated,  so  that  their  variation  associated  with  a  given  pathological 
state  is  interdependent.  If  such  variables  are  used  in  a  statistical  method 
based  on  a  pattern  of  a  given  number  of  independent  variables,  they  may  bias 
the  result  by  giving  an  undue  high  weight  to  a  single  variation  (accompanied 
by  a  set  of  dependent  variables).  It  is  important,  therefore,  to  identify 
such  co-variances  and  eliminate  the  satellite  variables  from  the  diagnostic 
classification  pattern.  A  desirable  feature  of  a  selecting  technique  would 
be,  therefore,  the  ability  to  identify  covariance  and  minimize  its  effect. 


a.  The  Wilcoxon  Test 


This  non-parametric  ranking  of  variables  according  to  the  probability  of 
being  constituents  of  the  same  population  has  been  described  in  our  previous 
reports.  This  ranking  has  two  important  shortcomings:  first,  it  does  not 
identify  covariance,  and  second,  it  will  not  discriminate  against  artifactual 
deviants.  In  fact  this  treatment  may  give  a  variable  (peak)  with  an  unlikely 
large  deviation  in  one  of  the  cases  (spectra)  an  undue  outstandingly  low 
probability.  This  second  shortcoming  is  avoided  in  the  t-test  variable  rating, 
b.  The  t-test  Rating 

We  have  applied  the  t-test  program  P3D  of  the  BMDP  programs  package  (UCLA 
1977)  to  determine  the  null  hypothesis  that  each  of  the  variables  in  two 
tested  groups  of  cases  belongs  to  the  same  population.  Unlike  in  the  Wilcoxon 
test  this  probability  is  calculated  on  the  basis  of  deviations  from  the 
group's  average,  while  taking  into  account  the  difference  in  values  between 
the  two  group  averages.  This  program  also  provides  us  with  the  variance  of 
each  variable  in  each  group,  so  that  artifactual  deviants  can  be  readily 
identified  and  discounted. 

In  spite  of  the  intrinsic  differences  between  the  Wilcoxon  and  the  t-test, 
the  two  programs  identified  50  out  of  400  peaks  from  the  same  test  set  of 
spectra  as  of  prime  diagnostic  value  with  an  overlap  of  over  90%  of  the 
variables  selected,  (although  the  order  of  ranking  by  the  two  procedure  was 
somewhat  different).  In  view  of  this  finding  and  since  the  t-test  provides 
additional  useful  information  we  prefer  now  to  use  this  program  for  selection 
of  the  diagnostic  peaks. 


c.  The  Stepwise  Discriminant  Analysis  Procedure. 

This  procedure  to  be  discussed  below,  selects  and  rank  orders  variables 
according  to  their  "F"  values.  The  F  value  for  each  variable  is  proportional 
to  the  square  of  the  intergroup  difference  and  inversely  proportional  to  the 
square  of  the  intragroup  variance  around  that  group's  average.  Since  this 
procedure  requires  considerably  more  computer  capacity  than  the  two  preceding 
methods,  it  can  handle  just  a  limited  number  of  variables  (50  peaks  in  our 
case).  This  limitation  requires  pre-selection  of  variables  by  one  of  the 
preceding  techniques,  to  be  followed  by  their  F  value  ranking  according  to 
their  usefulness  in  separating  the  cases  into  distinct  groups.  The  main 
feature  of  this  statistical  procedure  is  its  ability  to  identify  and  reject 
co-variants.  In  spite  of  this  important  advantage,  the  discriminant  analysis 
can  hardly  be  considered  a  practical  peak  selecting  procedure,  because  of  its 
high  demand  on  computer  time  and  capacity.  However  since  this  technique  is 
being  used  as  a  group  classification  and  case  assignment  procedure,  the  peak 
selection  and  ranking  according  to  the  F  values  may  be  considered  as  a  fringe 
benefit. 

d.  Modes  of  Use  of  Selected  Variables 

The  variables  for  a  given  classification  procedure  can  be  selected  by 
virtue  of  meeting  a  certain  arbitrary  criterion  (e.g.  having  a  p  value  below  a 
given  value),  or  by  rank  ordering  according  to  a  given  criterion  (e.g. 
starting  with  the  variable  of  lowest  p  value,  followed  by  the  next  lowest,  and 
so  on)  and  then  picking  an  arbitrary  number  (e.g.  50)  having  the  lowest  p 
values.  The  selected  variables  can  then  be  used  in  the  group  classification 
procedures  without  any  weighting. 


Alternatively  a  characteristic  classification  parameter  (e.g.  the  p  value) 

can  be  used  as  a  weighting  factor.  Since  the  diagnostic  usefulness  increases 

as  p  decreases,  using  1/p  as  a  weighting  factor  is  perhaps  the  simplest 

weighting  procedure.  This  weighting  will,  however,  give  variables  with  very 

small  p  values  a  very  high  weight.  Alternative  weighting  factors  could  be  for 
3 

instance  1/  p ,  1 /  p  ,  or  log  1/p,  which  would  decrease  the  overweighting 

of  variables  with  very  small  p  values. 

There  are  advantages  to  either  peak  selection  procedure.  The  discriminant 
method  (by  an  arbitrary  cut-off)  requires  human  judgement  for  each  set  of 
cases.  This  shortcoming  can  be  eliminated  by  rank  order  cut-off.  Using  the 
t-test  one  can  use  the  variance  in  addition  to  the  t  or  p  values  as  a  second 
criterion  in  selection. 

The  weighting  procedure  while  free  from  subjective  intervention 
nevertheless  requires  optimization  to  obtain  the  best  use  of  separating 
variables.  However,  the  weighting  of  variables  as  in  the  WNI  procedure  (see 
our  1979  report)  involves  the  choice  of  an  arbitrary  weighting  function  which 
is  at  best  a  compromise  between  an  optimized  use  of  the  variables  and  a  use 
based  on  a  subjective  threshold. 


During  the  previous  phase  of  this  program  we  have  used  basically  just  one 
classification  procedure,  namely  the  weighted  non-parametric  index  (WNI) 
method  which  has  been  described  in  our  previous  reports. 

This  procedure  which  is  simple  and  straight  forward  has  certain 
limitations.  First,  it  is  applicable  to  only  two  groups  of  variables. 

Although  when  WNI  (1)  is  plotted  vs.  WNI  (2)  one  can  obtain  some  secondary 
clustering,  indicating  sub-groupings,  but  the  separation  between  these  is 
biased  by  the  original  choice  of  two  groups,  since  the  average  value  of  each 
variable  in  each  group  is  the  reference  point  for  the  WNI  calculation.  Second 
when  small  sets  of  cases  are  analysed,  the  WNI  values  are  strongly  influenced 
by  variables  with  large  variances  from  their  respective  group  averages.  This 
is  especially  true  when  the  difference  between  the  group  average  is  relatively 
small.  Third,  the  diagnostic  referent  point  on  the  D  =  WNI  (1)  -  WNI  (2) 
scale  is  arbitrary,  which  becomes  problematic  if  the  D  values  for  members  of 
the  two  groups  form  a  progressive  continuum  without  a  significant  gap  between 
D's  of  the  two  groups.  Fourth,  this  "non-parametric"  procedure  does  not 
provide  us  with  a  measure  of  probability  of  a  given  case  belonging  to  each 
group,  thus  it  is  lacking  a  quantitative  measure  for  tine  diagnostic  assignment 
of  a  given  case  to  a  particular  group. 

In  view  of  these  limitations,  we  have  experimented  with  two  other 
classification  procedures  -  the  clustering  analysis  procedure  (P2M  procedure 
BMDP  UCLA,  1977)  and  the  stepwise  discriminant  analysis  procedure  (P7M  BMDP 
UCLA,  1977).  The  former  classification  is  free  of  the  bias  of  assignment  of 
cases  to  a  particular  number  of  groups,  whereas  the  latter  can  handle 
efficiently  a  large  number  of  pre-specif iced  groups  and  it  provides  the 
probability  of  each  case  belonging  to  any  of  the  groups  in  question. 


In  this  procedure  one  represents  each  of  the  cases  (each  with  n  variables) 
as  a  point  on  a  surface  of  a  n  dimensional  space.  This  is  done  by  calculating 
an  n  dimensional  vector  as  a  resultant  of  the  values  of  the  variables  measured 
on  n  orthogonal  coordinates.  If  we  have  many  cases,  each  constituting  a  point 
on  the  n  dimensional  surface,  we  can  calculate  the  distance  on  this  surface 
between  any  given  two  points.  The  clustering  procedure  selects  the  two  points 
with  the  shortest  n  dimensional  Euclidian  distance  between  them,  producing  a 
cluster  of  two.  Then  the  program  tries  to  find  among  all  the  remaining  points 
a  (third)  point  (case)  closest  to  the  first  two  points,  forming  a  cluster  of 
3.  Again  the  program  selects  and  registers  a  (fourth)  point  closest  to  the 
cluster  of  three,  and  so  on. 

When  the  distance  between  the  growing  cluster  and  the  next  point  is  larger 
than  between  a  pair  of  the  remaining  points,  a  new  cluster  of  two  is  selected, 
which  can  again  grow,  aggregating  points  in  its  vicinity.  This  process 
continues  until  the  nearest  distance  remaining  is  the  distance  between  the 
boundarie  of  two  clusters,  which  are  then  registered  as  a  cluster  of 
clusters.  The  classification  is  ended  when  all  points  are  accounted  for,  when 
a  master  cluster  containing  all  the  points  (all  the  cases)  is  registered. 

This  approach  is  completely  bias-free  as  far  as  the  number  of  groups 
(clusters)  it  will  form  from  a  set  of  cases;  only  after  clustering  can  one 
check  the  a  priori  assignment  of  a  given  case  against  the  cluster  it  ended  up 
in.  The  program  also  allows  the  identification  of  the  relative  positions  of 
points  (cases)  within  clusters.  On  the  other  hand,  the  procedure  does  not 
test  the  variables  for  their  variances  from  a  group  average  (which  is  done  in 
the  WNI  and  in  the  discriminant  analysis  classifications)  or  for  co-variance, 
which  is  performed  in  the  discriminant  analysis. 
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The  Stepwise  Discriminant  Analysis 

This  classification  procedure  separates  the  cases  into  a  prespecified 


number  of  groups  after  analyzing  the  variance  of  each  variable.  This 
procedure  also  selects  those  variables  which  separate  the  cases  into  the 
specified  groups  most  effectively. 

The  procedure  first  determines  the  variance  of  each  variable  within  each 

group  and  compares  it  to  the  variances  between  groups.  The  comparison  is  done 

by  calculating  F  values,  i.e.  dividing  the  square  of  intergroup  variance 

(SG)  by  the  square  of  the  variances  (Sy)  of  the  individual  variables 

2  2 

around  the  corresponding  group  average:  F  =  Sq /Sv .  The  program 
then  selects  the  variable  with  the  highest  F  value  and  if  this  is  larger  than 
a  pre-specif ied  threshold  "F-to  enter"  it  will  use  this  variable  to  classify 
the  cases  into  groups. 

In  the  second  step  it  will  test  all  remaining  variables  for  their  ability 
to  separate  between  groups  on  a  2  dimensional  surface,  comparing  again  the  new 
intergroup  variance  to  the  group  variances.  The  variable  with  the  highest  "F 
to  enter"  value  at  this  step  will  be  then  added  to  the  first  selected 
variable,  provided  its  F  value  exceeds  the  "F  to  enter"  threshold  and  provided 
that  a  correlation  coefficient  between  the  two  variables  is  not  above  a 
specified  limit.  A  high  correlation  would  obviously  invalidate  the  group 
classification  by  two  presumably  independent  variables. 

After  the  two  classifying  variables  are  selected,  the  program  computes  for 
each  case  a  point  (vector)  on  a  3  dimensional  surface  using  each  of  the 
remaining  variables  combined  with  the  2  variables  selected  in  steps  1  and  2. 
The  variable  with  the  highest  F  value  is  selected,  provided  it  exceeds  the 
threshold  and  that  it  does  not  co-vary  with  the  combination  of  the  two  first 


variables.  If  it  does  not  fulfill  the  second  condition  the  variable  with  tr 


next  highest  F  value  is  selected  and  tested  against  the  two  criteria.  The 
variable  selection  procedure  is  continued  until  all  remaining  variables  end  up 
with  F  values  below  the  threshold  or  exhibit  excessive  covariance  with  the  set 
of  selected  variables. 

Following  each  step  the  program  also  re-checks  the  F  values  of  all  the 

variables  selected  for  classification  up  to  that  step,  since  their  F  values 

will  change  with  each  added  variable.  If  any  of  the  previously  selected 

variables  huS  an  F  value  lower  than  a  given  threshold  ("F  to  remove")  it  will 

be  removed  from  the  classification  set,  and  the  procedure  is  repeated  again 

for  all  the  remaining  non-classifying  variables  to  select  a  new  one  with  an 

acceptable  F  value  and  covariance  coefficient.  The  variables  selected  at  each 

step  are  combined  to  form  a  linear,  optimized  classification  function 
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(n-dimensional  vector)  that  maximizes  the  Sg/Sy  ratio. 

Once  the  variable  selection  process  is  complete  using  say  m  variables,  the 
cases  are  classsified  into  groups  as  points  on  a  m  dimensional  surface  and  the 
intercase  distance  is  calculated.  From  these  the  centroids  or  group  averages 
are  computed  as  is  the  probability  of  assignment  of  each  case  to  any  given 
group.  The  grouping  can  be  presented  graphically  in  two  dimensions  as  points 
indicating  the  groups  centroids,  or  as  clusters  of  points  (representing  the 
individual  cases)  around  the  centroids. 

4 .  A  Comparative  Evaluation  of  the  Statistical  Classification  Procedures. 

As  stated  above  each  of  the  three  classification  procedures  tested  by  us 
has  more  desirable  and  less  desirable  features  when  compared  to  the  others. 

To  illustrate  this  we  applied  the  three  procedures  to  the  same  set  of  data, 
namely  analysis  39  spectra  from  urine  of  14  patients  with  alcoholic  liver 
disease  compared  with  26  spectra  from  13  healthy  adults. 


The  results  of  the  Wilcoxon  test  on  100  peaks  of  lowest  p  values  is  given 
in  Figure  10.  The  WNI  values  of  the  65  cases  for  the  two  groups  using  1/p 
weighting  as  well  as  the  values  of  D  =  WNI  (1)  -  WNI  (2)  are  given  in  the  same 
figure.  One  can  see  here  that  all  the  D  values  of  the  pathological  samples 
(LV)  are  negative  and  smaller  than  -35,  whereas  those  of  all  the  controls  (CP) 
are  positive,  with  the  exception  of  case  CP22  which  is  negative  (-9.5)  but 
still  significantly  larger  than  any  of  the  LV's.  In  other  words  this 
classification  did  not  show  any  overlap  between  the  tested  groups.  A 
computerized  graphical  presentation  of  the  WNI  data  is  presented  in  Figure  11 
where  each  case  is  presented  by  its  WNI  (1)  and  WNI  (2)  values.  We  see  here 
the  clustering  of  the  two  groups  with  a  distinct  region  of  demarcation  between 
them.  Although  WNI  (1)  =  WNI  (2)  for  a  case  would  indicate  that  it  equally 
belongs  to  the  two  groups,  this  is  not  necessarily  true  when  applied 
statistically  to  two  groups  where  at  least  one  cluster  has  0  values  very 
different  from  zero. 

The  clustering  analysis  applied  to  the  same  cases  is  presented  in  Figure 
12.  Here  cases  1  to  23  were  of  liver  patients,  and  cases  24-48  were  of 
controls.  We  see  here  that  clustering  began  with  case  17,  13  and  6  followed 
by  cases  19  and  11.  By  the  end  of  the  clustering  process  all  samples  1  to  24 
(the  pathological  cases)  were  in  one  cluster  with  case  44  (of  the  controls) 
being  the  next  one  added  to  this  cluster  (but  only  in  step  a41  in  the 
amalgamation  order).  All  the  other  controls  are  again  in  a  distinct  different 
cluster.  One  may  also  distinguish  some  sub-clusters  like  cases  1,  2,  10  and 
11  or  15,  21,  17,  16,  and  20  among  the  pathological  samples.  Since  this 
program  does  not  presume  any  predetermined  groups  for  classification  these 
subgroups  may  have  some  biochemical  features  in  common  in  addition  to  being 
part  of  a  liver  disease  population. 
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Figure  13  presents  part  of  the  computer  output  of  the  Euclidian  distances 
between  the  points  representing  each  case  on  an  n  dimensional  surface.  (The 
distance  between  48  points  to  just  15  points  of  other  cases  are  shown  in  this 
figure).  These  distances  are  then  amalgamated  or  clustered  as  discussed  above. 

Figure  14  presents  the  same  variables  and  their  respective  intragroup 
averages  used  for  the  subsequent  stepwise  discriminant  analysis.  Figure  15 
shows  the  first  two  steps  of  selection  of  the  two  first  variables  for 
classification,  namely  mass  peaks  197  and  95  respectively.  The  same  figure 
then  describes  step  #15  of  the  procedure  by  which  15  peaks  were  selected  by 
the  "F-to-enter"  and  covariance  criteria.  Figure  16  presents  a  later  stage  in 
the  procedure  -  step  #40,  when  just  30  variable  (peaks)  were  selected  as 
parameters  in  the  classification,  indicating  that  10  variables  originally 
selected  were  subsequently  rejected  due  to  F  values  below  the  ("F-to-remove" ) 
threshold. 

Figure  17  presents  the  distances  from  each  group  centroid  to  the  points 
representing  each  of  the  cases  calculated  from  the  30  variables  finally 
selected.  This  table  gives  also  the  probability  of  assignment  of  each  case  to 
a  given  group.  One  can  see  that  each  case  in  the  L  group  have  been  assigned 
to  it  with  a  probability  of  unity  and  similarly  each  case  in  the  C  group 
(controls)  has  been  assigned  to  this  group  with  a  probability  of  unity. 

Figure  18  is  a  graphical  presentation  of  the  same  data  recalculated  for  2 
dimensional  projection.  The  digits  1  and  2  are  the  positions  of  the  centroids 
of  each  group  respectively.  The  resolution  of  the  computerized  printout  is 
limited,  so  that  if  more  than  one  case  fall  on  the  same  overall  unit  area  on 
the  plot  they  will  be  presented  by  just  a  single  mark.  Therefore,  only  19 
points  are  shown  for  the  pathological  sample  and  coincidentally  19  points  were 
printed  for  the  controls.  The  actual  coordinates  for  each  of  the  48  cases  is 
presented  in  Figure  17. 
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We  see  that  the  stepwise  discriminant  analysis  has  separated  the  cases  in 
a  more  "decisive"  manner  than  by  the  WNI  test  (Figure  11  vs  Figure  18).  This 
is  not  surprising  in  view  of  the  fact  that  it  was  given  the  "best"  preselected 
50  variables  and  its  own  procedure  used  only  30  out  of  these  for  the  ultimate 
classification.  Although  the  quantitative  and  graphical  presentations  by  this 
procedure  are  the  most  "convincing"  among  the  three  types  of  classification, 
it  is  the  clustering  analysis  that  demonstrated  in  an  utterly  unbiased  fashion 
that  in  this  study  we  had  just  two  biochemically  distinct  groups. 

The  Wilcoxon-WNl  program  was  recently  modified  to  classify  unknown 

samples.  First  two  known  groups  are  compared.  "P-values"  for  each  m/e  and 

average  spectra  for  each  group  are  calculated.  Second  WNI's  are  calculated 

for  each  unknown  sample  from  the  average  spectra  of  the  two  given  groups.  The 

-1  /2 

weighting  factor  used  was  P  .  This  modification  of  the  Wilcoxon-WNI 
program  provides  unbiased  classification  of  unknown  samples  because  they  do 
not  contribute  to  the  calculations  of  the  "p-values"  and  the  average  spectra 
of  given  groups,  and  it  gives  a  direct  comparison  with  the  classification  of 
unknown  samples  by  discriminant  analysis. 

As  stated  elsewhere  in  this  report  we  shall  proceed  with  additional 
comparative  statistical  analyses  of  other  series  of  biological  samples, 
including  sets  less  distinctly  different  than  the  one  shown  here,  before 
deciding  which  statistical  treatment  is  most  suitable  for  a  given  problem. 

D.  EXPERIMENTAL  RESULTS 
1 .  Hospital  Patient  Studies 

The  entire  multicomponent  mixture  analysis  procedure  has  been  tested  using 
four  sets  of  pathological  and  three  sets  of  control  samples.  Two  sets  of 
samples  representing  liver  malfunction  have  been  obtained.  One  set  of  eight 
samples  includes  a  variety  of  diseases,  including  malignancies,  with  liver 
involvement  as  a  primary  or  secondary  reason  for  hospitalization.  Another  set 
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of  12  samples  is  more  homogeneous  representing  primarily  alcoholic  hepatitis. 
A  sample  set  from  7  individuals  hospitalized  in  the  same  ward  with  the 
alcoholic  hepatitis  patients  was  evaluated  as  a  control  set.  The  reason  for 
studying  of  patients  with  liver  disorders  was  to  obtain  data  on  patients  with 
liver  involvement  other  than  viral  hepatitis,  which  has  been  studied  by  us 
earlier  in  this  program  (Clin.  Chem.  22^  1503  (1976)). 

Samples  were  provided  by  the  staff  of  Childrens  Hospital  from  7  pneumonia 
cases  (probably  of  viral  origin),  12  diarrhea  patients  diagnosed  as  of  viral 
origin  (Rotavirus),  and  6  samples  selected  from  patients  hospitalized  for 
concussion  and  tonsi lectomy,  situations  unlikely  to  involve  an  infectious 
organism.  An  additional  set  of  13  adult  control  samples  was  obtained  from 
healthy  volunteers  within  the  university. 

Each  sample  was  analyzed  in  duplicate,  interspersing  control  and 
pathological  samples  to  avoid  systematic  errors.  Summed  spectra  were 
assembled  into  data  sets  for  processing  in  the  CYBER  by  either  the  Wilcoxon- 
WNI  programs  or  the  BMDP  programs. 

The  results  of  the  Wilcoxon-WNI  programs  are  presented  for  several  data 
sets  in  figures  19  through  25.  The  graphs  display  WNI(l)  versus  WNI(2)  for 
each  sample  in  both  groups.  The  WN 1(1)  axis  represents  the  difference  of  a 
given  sample  from  the  average  spectra  of  the  pathological  group  while  the 
WN 1(2)  axis  shows  the  difference  of  a  spectra  from  the  control  group  class 
average.  In  this  representation,  control  samples  should  fall  close  to  the 
vertical  axis  and  a  maximum  distance  from  the  horizontal,  while  pathological 
samples  should  fall  near  the  horizontal  axis  and  away  from  the  vertical  axis. 
The  comparison  of  the  diarrhea  in  children  with  children  controls  (Fig.  19) 
shows  two  control  samples  falling  within  the  pathological  group,  when  all 
masses  are  used  with  a  weighting  function  of  1/p  to  compute  the  WNI's.  The 


WNI  PLOT  OF  CHILDREN'S  CONTROL  SAMPLES  .(CC-CLOSEO 
VS  DIARRHEA  IN  CHILDREN  (DC-OPEN  CIRCLES)  USING  Al 
WEIGHTED  BY  1/p  FOR  COMPUTING  UNI'S 


use  of  only  the  eight  mass  of  lowest  p-value  (Fig.  20)  not  only  shows  complete 
separation  of  control  and  diarrhea  cases,  but  seems  to  reveal  a  distinct 
grouping  among  the  diarrhea  cases  that  may  be  the  result  of  a  different 
response  to  the  infecting  organism,  or  perhaps  due  to  a  different  etiology  of 
the  disease.  For  several  of  the  diarrhea  cases,  the  samples  of  whom  were 
collected  within  a  few  days  of  each  other,  rotavivus  was  detected  by 
independent  tests. 

The  pneumonia  cases  show  one  individual  from  the  pathological  set  falling 
within  the  control  group  using  WNIs  based  on  the  14  peaks  of  lowest  p-value 
(Fig.  21).  The  hospital  records  for  this  child  indicated  that  this  patient 
was  discharged  the  day  after  the  sample  was  taken,  so  it  might  be  surmised 
that  this  child  had  already  recovered  from  the  infection.  Adding  this 
individual  to  the  control  group  and  removing  the  sample  pair  from  the 
pathological  group  did  not  appreciably  change  the  clustering  of  samples 
obtained  before,  indicating  that  this  sample  properly  belongs  in  the  control 
group. 

An  example  of  differential  diagnosis  is  given  in  Fig.  22  comparing  the 
diarrhea  to  the  pneumonia  cases.  Incomplete  separation  was  obtained  using  all 
masses  weighted  as  1/p  in  the  WNI  calculation. 

Figures  23  and  24  show  the  mixed  set  of  liver  disorders  compared  to 
healthy  adult  controls,  using  all  masses  weighted  by  1/p  and  using  only  the  12 
masses  of  lowest  p-value.  Again  the  use  of  only  the  diagnostic  peaks  improves 
the  separation  between  the  two  classes.  Comparing  the  alcoholic  hepatitis  set 
with  hospital  controls  (Fig.  25)  reveals  a  weakness  in  the  choice  of  this 
control  set,  so  that  although  the  alcoholic  hepatitis  samples  are  tightly 
clustered,  a  number  of  samples  from  the  control  set  fall  within  this 
grouping.  In  this  experiment  patients  with  gastric  ulcer,  functional  gall 
bladder  disease,  and  anemia  appear  to  have  similar  profile  patterns  to  those 


FIGURE  20 


WiN  1(2) 

CONTROL  (CC-CLOSED  CIRCLES)  VS  DIARRHEA  SAMPLES  (DC-OPEN  CIRCLES) 
USING  8  m/e  VALUES  OF  LOWEST  p  VALUtS  TO  COMPUTE  UNI'S 


i  . 


i 


i 


WN 1(2) 


CHILDREN'S  CONTROL  SAMPLES  (CC-CLOSED  CIRCLES)  VS 
CHILDREN'S  PNEUMONIA  SAMPLES  USING  14  m/e  VALUES  FOR 
WNI'S  ( PC-OPEN  CIRCLES). 


FIGURE  22 


WNI  (2) 


PNEUMONIA  SAMPLES  (PC-CLOSED  CIRCLES)  VS  DIARRHEA  SAMPLES 
(DC-OPEN  CIRCLES) 


ADULT  CONTROL  SAMPLES  (CA  CLOSED  CIRCLES)  VS  MIXED  LIVER 
DISORDERS  (HA-OPEN  CIRCLES)  USING  ALL  m/e  VALUES. 
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F IGURE24 


FIGURE  25 


HOSPITALIZED  CONTROLS  (CL-CLOSED  CIRCLES)  VS 
ALCOHOLIC  LIVER  DISEASE  (LV-OPEtl  CIRCLES) 


with  liver  disorders.  This  example  illustrates  the  importance  of  obtaining 
well  characterized  samples  upon  which  to  base  the  initial  pattern  analysis. 

Comparing  the  alcoholic  hepatitis  set  with  the  adult  controls  (Fig.  11) 
shows  a  much  clearer  separation.  In  both  comparisons  the  set  of  pathological 
samples  appears  to  be  a  well  defined  homogenous  group.  The  use  of  the  adult 
control  set  in  the  comparison  generates,  however,  a  different  set  of  masses 
with  the  lowest  p-values.  In  addition  the  p-values  were  2  to  3  orders  of 
magnitude  smaller  using  the  adult  controls.  Tabular  results  of  the 
Wilcoxon-WNI  analysis  of  these  samples  is  presented  in  Figure  10. 

These  two  groups  of  samples  were  also  analyzed  using  the  BMDP  programs. 
First  the  BMDP  3D  program  was  used  to  calculate  the  t-statistic,  separate  and 
pooled,  for  the  peak  intensities  at  each  m/e  value  for  the  null  hypothesis  of 
equivalent  means  for  both  the  pathological  and  control  groups.  Based  on  these 
results  51  masses  were  selected.  A  significant  number  of  the  masses  selected 
by  this  method  also  show  low  p-values  by  the  Wilcoxon  test.  Using  these 
selected  variables  both  cluster  and  stepwise  discriminant  analysis  programs 
were  used  to  classify  the  individual  cases.  These  results  have  been  described 
in  section  3D. 

2.  Longitudinal  Studies  on  Virus  Infected  Patients. 

Controlled  longitudinal  studies  were  carried  out  on  two  sets  of  volunteers 
at  the  USAMRIID.  One  group  consisted  of  seven  individuals  who  received  live 
virus  vaccine  for  sandfly  fever.  Two  additional  individuals  in  this  group 
received  a  placebo  injection.  None  of  the  participants  were  told  whether  they 
received  the  vaccine  or  control  injection.  Morning  urine  samples  were 
collected  from  paticipants  4  days  prior  to  the  injection,  the  day  of  the 
injection,  for  8  consecutive  days  subsequent  to  injection  and  finally  28  days 
after  the  injection. 


All  samples  were  stored  frozen  without  preservative,  shipped  to  this 
laboratory  packed  in  dry  ice,  and  kept  frozen  at  -20°C  until  analysis.  We 
initially  analyzed  samples  from  three  individuals  in  the  sandfly  fever 
experiment  without  any  prior  knowledge  of  the  sample  classification  (vaccine 
vs.  control)  or  the  expected  time  and  duration  of  the  symptoms.  The  samples 
from  each  individual  were  prepared  and  analyzed  in  duplicate  in  a  random 
sequence.  In  a  few  cases  during  this  series,  a  sample  had  to  be  repeated  due 
to  instrument  malfunction  during  data  acquisition.  In  these  cases  we  found 
that  better  replicate  samples  were  obtined  by  using  the  salt  saturated  urine 
remaining  from  the  previous  extraction  rather  than  using  the  original  refrozen 
sample.  One  possible  explanation  is  that  a  bacterial  contamination  may  have 
affected  the  original  samples  during  their  exposure  to  room  temperature.  We 
did  not  observe  similar  differences  in  repeated  analyses  of  samples  that  are 
collected  and  stored  with  ZnS04  as  a  preservative.  However  additional 
samples  of  the  same  urines  obtained  from  USAMRIID  at  a  later  date  and  analyzed 
9  months  later  showed  substantial  changes  from  the  first  set  of  samples, 
indicating  that  storage  at  -20°C,  at  least  without  a  preservative  to  inhibit 
enzymatic  activity,  does  not  guarantee  the  preservation  of  sample  composition. 

The  data  from  the  first  three  individuals  of  the  sandfly  fever  showed 
significant  temporal  differences  in  the  patterns  of  two  of  the  individuals 
(Joffe,  LeBlanc)  compared  to  the  third  (Berry). 

Throughout  the  sampling  period  Berry's  profiles  show  significantly  1ower 
intensity  over  the  entire  mass  range.  This  individual  also  apparently  did  not 
consume  any  caffeinated  beverages  during  the  study,  leading  to  a  further 
qualitative  difference  in  this  person's  pattern  compared  to  the  other  two 
participants.  We,  therefore,  did  not  include  Berry's  patterns  in  the  initial 
analysis  of  the  data.  (We  have  received  the  information  that  this  individual 
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served  as  one  of  the  controls  after  the  analysis  of  all  samples  were 
completed).  With  only  two  individuals  to  examine  we  were  concerned  that 
dietary  variations  plus  individual  variations  in  the  response  to  the  vaccine 
might  obscure  the  detection  of  the  pathological  pattern  or  location  of  the 
maximum  response  period.  Two  approaches  were  partially  successful  in  handling 
this  problem  of  deciphering  an  unknown  pattern  appearing  at  an  unknown  time  in 
a  small  data  set. 

In  the  first  approach  the  BMDP  70  program  for  variable  stratification  was 
employed  to  show  the  means  and  intensity  distributions  for  each  m/e  value  as  a 
function  of  the  sample  collection  sequence.  An  example  is  shown  in  Figure 
26.  This  analysis,  although  tedious  and  inefficient  for  routine  use,  did 
reveal  a  substantial  number  of  masses  showing  changes  in  intensity 
distributions  at  days  3,  4  and  5.  The  second  approach  involved  comparing 
"arbitrarily"  patterns  of  days  minus  4  and  zero  against  days  3,  4,  and  5, 

Days  3-5  were  also  the  expected  time  for  development  of  symptoms  for  the 
particular  virus  employed. 

The  t-test  program  P3D  was  used  to  select  40  m/e  values  showing  maximum 
differences  between  the  control  period  (days  -4  and  0)  and  the  expected 
response  period  (days  3  through  5).  The  clustering  analysis  program  was 
applied  to  the  two  virus  infected  sample  sets  using  the  selected  m/e  values 
from  the  baseline  days  -4  and  0,  day  2,  and  days  3  through  5,  the  latter 
presumably  showing  changes  associated  with  the  response  to  the  virus 
infection.  The  results  of  this  analysis  are  shown  in  Figure  27.  It  should  be 
noted  that  this  analysis  involves  no  preselection  of  the  number  of  groups 
needed  to  classify  the  samples.  The  results  show  the  control  days  -4  and  0  in 
one  cluster  with  the  presumed  virus  response  days,  samples  of  day  3,  4,  and  5 
in  a  second  cluster.  The  day  2  samples  from  one  individual  clustered  with  the 
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control  days  group  of  samples  while  corresponding  samples  from  the  other 
individual  belonged  to  the  group  of  samples  of  days  3  to  5. 

The  same  40  variables  were  used  in  the  discriminant  analysis  program  to 
derive  a  classification  function  for  separating  days  -4  and  0  from  days  3,4, 
and  5.  This  classification  function  was  used  to  classify  sample  days  2,7,8 
and  28  of  the  same  individuals.  These  results  are  shown  in  Figure  28.  The 
classification  of  day  2  samples  was  the  same  as  obtained  in  the  clustering 
analysis.  We  have  been  informed  by  USAMRIID  that  these  individuals  showed  a 
similar  difference  in  their  rates  of  return  to  a  normal  profile  following  the 
response  period.  Neither  subject  developed  symptoms  prior  to  the  middle  of 
day  3  and  both  subjects  were  free  of  symptoms  by  day  7.  The  observed  pattern 
differences  in  morning  urine  samples  on  day  3  thus  preceded  the  appearance  of 
clinical  symptoms  and  persisted  following  their  disapoearance.  The  third  set 
of  samples  from  a  control  subject  was  classified  as  healthy  throughout  by  the 
same  classification  procedure. 

Unfortunately  in  this  series  no  samples  were  collected  beyond  day  8  so 
that  it  is  impossible  to  determine  at  this  point  the  time  of  unambiguous 
disappearance  of  the  pathological  pattern.  In  any  case  this  pattern  in  urine 
seems  to  appear  prior  to  the  clinical  symptoms  and  to  persist  beyond  the  time 
of  their  disappearance. 

The  samples  from  the  remaining  participants  in  this  study  were  shipped  to 
us  at  a  later  date  and  days  -4,  0,  3,  4  and  5  from  persons  receiving  the 
vaccine  were  selected  for  initial  analysis.  During  the  preparation  of  this 
set  the  addition  of  EDTA  was  inadvertently  omitted  from  the  extraction 
procedure.  Our  statistical  analysis  of  this  sample  set  shows  a  detectable 
pattern  difference  due  to  this  change  in  the  procedure  making  a  direct 
classification  of  all  seven  vaccinated  individuals  more  complicated.  The  use 
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of  the  t-test  comparing  normal  and  "pathological "  periods  of  this  last  set  of 
samples  leads  to  a  selection  of  variable-5  that  correctly  classifies  these 
samples  using  the  discriminant  analysis  program  (see  Figure  29).  The  use  of 
these  variables  with  the  discriminant  analysis  program  to  classify  samples 
from  all  seven  individuals  into  four  groups  wa  ,  also  successful.  However, 
when  we  attempted  to  analyze  the  total  sequence  of  samples  of  2  individuals  of 
the  new  set,  together  with  the  urine  samples  of  one  of  the  two  individuals  of 
the  first  set  (LeBlanc)  some  6  months  later  (9  months  after  the  first  set  of 
samples  was  analyzed  -  in  the  interim  we  carried  out  the  Cl  study  and  the  in 
vitro  experiments  described  below),  we  were  disappointed  to  find  out  that  the 
composition  of  the  samples  has  changed  significantly.  Although  distinction 
between  the  normal  and  pathological  days  was  still  possible,  the  separation 
was  not  as  distinct.  The  results  are  readily  apparent  looking  at  Figure  30. 
This  figure  shows  a  cluster  analysis  of  the  original  LeBlanc  samples 
(designated  by  L)  and  the  LeBlanc  samples  that  were  analyzed  8  months  later 
(designated  by  SFFL)  using  the  40  masses  which  were  selected  by  the  t-test  of 
the  old  l.e8lanc  and  Joffe  samples  of  the  control  and  pathological  days. 

Looking  at  the  "old"  LeBlanc  samples,  one  notices  a  definite  separation 
between  the  control  days  (-4  and  0,  designated  L01  and  LN41)  and  the 
pathological  days  (3  through  5,  designated  L3,  L4,  and  L5).  It  can  be  seen 
that  there  are  two  clusters  L41,  L51,  L31,  and  that  of  all  other  samples.  The 
"new"  LeBlanc  samples  (designed  SFFN4L2,  etc.)  both  control  and  pathological, 
form  a  cluster  together  with  the  control  days  of  the  "old"  LeBlanc.  One 
notices,  however,  that  in  that  cluster  there  is  still  a  definite  difference 
between  days  4  and  5  of  the  new  LeBlanc  (SFF4L2  and  SFF5L2)  and  the  rest  of 
the  samples  in  that  cluster.  However,  day  3  (SFF3L2)  clusters  with  the 
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Clearly  the  samples  have  changed  with  time.  Of  all  the  participants  in 
the  sandfly  fever  study,  LeBlanc  had  the  most  fever  hours  and  originally  he 
had  the  strongest  profile  pattern.  The  other  participants  had  less  fever 
hours  and  one  might  not  be  surprised,  therefore,  that  their  patterns  were  not 
as  distinct.  The  samples  of  some  of  the  other  participants  (also  analyzed  8 
months  after  the  original  samples)  gave  similar  results  to  the  chemical 
ionization  work  -  a  less  distinctive  separation. 

Since  considerable  care  and  effort  is  required  in  the  FI  methodology  we 
also  tested  a  simplified  profiling  procedure  consisting  of  analysis  of  urease 
digested  uring  by  isobutane  chemical  ionization  mass  spectrometry.  The  sample 
size  of  one  microliter  was  empirically  established  as  allowing  a  relatively 
rapid  sample  evaporation,  while  still  maintaining  a  large  excess  in  the 
isobutane  reagent  ion  intensity.  The  longitudinal  sample  set  of  each 
individual  were  prepared  as  described  in  section  82,  and  analyzed  in  duplicate 
in  a  random  sequence.  For  each  sample  set  an  aliquot  of  a  "standard"  urine 
from  a  healthy  subject  was  incubated  with  enzyme  at  the  same  time.  A  sample 
of  this  standard  urine,  as  well  as  a  water  blank  containing  the  enzyme  and 
buffer,  were  analyzed  at  the  beginning  of  each  sample  set  and  a  replicate  of 
the  standard  sample  was  repeated  at  the  completion  of  each  set  of  samples  of  a 
given  subject. 

One  objective  of  this  methodology,  a  high  sample  throughput,  was  achieved 
in  that  the  entire  set  of  longtudinal  samples  from  all  nine  participants, 
approximately  200  samples  including  blanks  and  standards,  was  completed  in  one 
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The  patterns  obtained  were  less  informative  compared  with  those  of  the 
extracted  samples  and  analyzed  by  FI.  This  is  not  surprising  since  the  sample 
size  used  limits  the  constituents  examined  by  this  procedure  to  species 
excreted  at  the  levels  above  one  mg  per  day.  A  representative  spectrum  shown 
in  Figure  31  is  dominated  by  the  protonated  molecular  ions  of  creatinine  (m/e 
114)  and  hippuric  acid  (m/e  180).  Direct  examination  of  urine  by  field 
ionization  using  50-75  microliters  (Figure  32)  gives  an  almost  identical 
pattern  except  for  the  appearance  of  some  constituents  one  amu  lower  as 
non-protonated  molecular  ions. 

Analyses  of  the  CIMS  patterns  by  the  same  programs  applied  to  the  field 
ionization  data  yielded  a  rather  weak  separation.  A  t-test  selection  of 
variables  and  discriminant  analyses  using  a  subset  of  single  replicate  samples 
from  all  infected  subjects  resulted  in  a  classification  function  that 
separated  the  control  days  (-4  and  0)  from  the  fever  days  (4  and  5). 
Classifications  of  each  person's  sample  series  by  this  classification  gave, 
however,  8a  false  positives  and  27a  false  negatives.  This  result  reflects  the 
low  signal  to  noise  ratio  of  diagnostic  information  in  this  completely 
non-selecti ve  pattern  of  major  urinary  constituents  which  is  not  expected  to 
be  as  significantly  affected  by  the  mild  infection  resulting  from  the 
experimental  vaccination. 

Although  these  findings  suggest  the  inadequacy  of  the  simplified  sample 
preparation  procedure  for  the  detection  of  subtle  metabolic  changes  associated 
with  the  infection,  they  indicate  that  at  least  in  terms  of  the  major  urinary 
constituents,  no  major  dietary  or  environmental  differences  have  been 
represented  in  this  sample  set.  Thus  the  more  subtle  pattern  differences 
detected  in  the  extracted  samples  are  obtained  in  the  context  of  a  normal, 
unperturbed  baseline  pattern.  However,  another  very  important  aspect  has  to 


be  taken  into  consideration  at  this  point.  These  samples,  which  were  stored 
at  -20°C,  were  approximately  a  year  old  when  they  were  finally  analyzed. 
Results  of  our  earlier  urine  storage  study  indicated  that  samples  stored  at 
-20°C  showed  little  changes  in  pattern  after  being  kept  for  a  month. 

Although  at  that  time  these  changes  were  not  significant,  storage  at  -20°C 
for  10  or  12  months  seems  to  be  utterly  inadequate.  The  repetition  of  FI 
analysis  on  the  same  samples  subsequent  to  the  Cl  analyses  (see  above), 
corroborated  this  conclusion.  It  remains  to  be  seen  if  another  longitudinal 
study  on  vaccinated  individuals  using  appropriate  preservation  of  storage 
would  not  be  amenable  to  the  simplified  Cl  procedure. 

3.  Study  of  Human  and  Animal  Tissue  Cultures  Infected  With  Polio  Virus. 

This  study  was  undertaken  to  determine  the  differences  between  infected 
and  uninfected  cell  cultures  and  to  see  how  soon  after  infection  of  the  cells 
such  differences  can  be  detected  in  the  culture  medium.  The  first  study  was 
performed  with  a  cell  line  of  human  embryonic  lung  and  the  Mahoney  strain  of 
polio  virus.  The  tissue  culture  work  was  carried  out  by  Dr.  Howard  Faden  and 
his  associates  at  the  Children's  Hospital  in  Buffalo. 

The  human  embryonic  lung  culture  was  originally  established  by  Dr.  Fishaut 
of  Denver  by  7  to  8  passes  of  tissue  obtained  from  an  aborted  fetus.  These 
cells  were  grown  in  Eagles  minimal  essential  growth  medium  which  was 
supplemented  with  5a  newborn  calf  serum.  Cultures  with  equal  numbers  of  cells 
were  prepared  in  5  ml  plastic  culture  tubes  that  were  incubated  at  37°C  for 
zero,  6,  and  24  hours.  For  each  of  these  three  time  points,  nine  tubes  were 
prepared.  Three  tubes  were  left  uninfected,  three  were  infected  with  1/10  rnl 
of  viral  solution  with  a  tissue  culture  infective  dosage  (TCID^g)  of  1  x 
10^,  and  three  tubes  with  a  TCID  of  1  x  104. 


These  titers  of  virus  were  allowed  to  adsorb  onto  the  cells  for  one  hour. 
At  that  time  the  cells  were  washed  with  isotonic  phosphate  buffer  solution  to 
remove  excess  virus  particles.  Following  the  wash,  new  medium  was  added  to 
the  tubes,  and  the  cells  were  incubated  for  the  times  indicated.  When  the 
incubation  times  were  reached,  the  tubes  were  refrigerated  and  centrifuged  to 
separate  the  cells.  We  received  the  chilled,  cell  free,  supernate  of  this 
centrifugation. 

To  insure  inactivation  of  the  virus,  20  microliters  of  .3M  HgCl^  was 
added  to  each  tube.  To  one  ml  of  this  medium  we  added  .25  ml  of  cold  70a 
perchloric  acid  and  .25  ml  of  cold  11  M  KOH  to  precipitate  the  proteins.  The 
supernate  of  each  tube  was  removed  and  diluted  with  an  equal  volume  of 
distilled  water  to  facilitate  pH  adjustment.  This  diluted  medium  was  then 
titrated  with  KOH  and  HCIO^  to  a  pH  of  2,  7,  or  10,  respectively.  A  150 
microliter  sample  was  removed  at  each  pH  and  placed  in  a  glass  culture  tube 
which  contained  a  folded  3  cm  x  1  mm  strip  of  Whatman  fiberglass  paper  GP/C. 
The  samples  were  evaporated  onto  this  paper  by  blowing  warm,  dry  nitrogen  over 
them.  The  paper,  with  the  sample  dried  onto  it,  was  then  introduced  into  the 
mass  spectrometrer. 

Initially  high  and  zero  virus  samples  incubated  for  zero  and  24  hours  from 
each  of  the  3  tubes  were  prepared  at  pH  2,  7,  and  10  in  duplicate.  The  data 
were  first  analyzed  using  the  t-test  to  find  peaks  showing  differences  at  24 
hours  between  zero  and  high  virus.  Using  masses  selected  in  this  manner, 
cluster  analysis  and  discriminant  analysis  programs  were  used  to  classify  the 
four  groups  consisting  of  24  hours  high  and  zero  virus  and  0  hours  high  and 
zero  virus.  A  better  separation  among  the  four  groups  was  observed  for  the 
samples  prepared  at  pH  2  (see  Figure  33  through  35).  Based  on  this  analysis 
pH  2  was  selected  for  preparation  of  high  and  low  virus  samples  from  6  hours 
and  24  hours. 


DISCRIMINANT  ANALYSIS  ACIDIC  pH 

1  , A  =  Virus  2 4 h 
2,B  =  Control  2 4 h 
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The  initial  classification  into  four  groups  revealed  several  trends  in  the 
patterns.  First  there  was  a  small  but  consistent  difference  in  the  zero  time 
patterns  of  different  virus  concentrations,  indicating  a  change  in  the 
composition  of  the  media  due  to  the  adding  of  virus.  This  is  an  artifact  not 
normally  considered  in  cell  culture  experiments  and  it  can  be  eliminated  in 
future  studies  by  making  all  virus  innoculations  including  the  zero  virus 
using  a  common  medium.  Since  that  experiment  we  have  tried  to  overcome  this 
artifact  by  using  UV  light  to  deactivate  the  virus  in  the  so  called  "virus 
free"  samples  and  use  deactivated  virus  medium  as  dilutant  in  the  control 
experiments.  We  have  shown  that  UV  irradiation  sufficient  to  deactivate  the 
virus  completely  does  not  introduce  significant  changes  in  the  mass 
spectrometric  profile. 

The  incubation  time  and  cell  growth  leads  to  time  dependent  changes  in  the 
intensity  of  selected  m/e  values.  We  observed  both  increases,  which  may  be 
due  to  cell  metabolites  excreted  into  the  media,  and  decreases  possibly 
reflecting  depletion  of  nutrients  in  the  media.  The  third  detectable  trend  is 
for  the  virus  infected  tubes  to  show  similar  time  dependent  trends  but  of 
reduced  magnitude,  consistent  with  an  inhibition  in  the  effective  cell  growth 
rate.  That  is,  the  same  peaks  which  increase  or  decrease  in  uninfected 
cultures  show  a  smaller  increase  or  decrease  respectivly  in  the  infected 
cultures.  Finally  and  more  interestingly,  there  are  a  limited  number  of  m/e's 
showing  intensity  changes  that  may  specifically  reflect  virus  activity.  These 
include  cases  where  virus  infected  cultures  show  larger  changes  in  the  same 
direction  observed  in  non-infected  tubes. 

Two  additional  t-tests  were  used  to  obtain  new  groups  of  m/e  values.  By 
performing  the  t-test  on  zero  versus  24  hours,  at  high  virus  concentration,  a 
set  of  peaks  was  obtained  reflecting  both  time  and  virus  dependent  pattern 


Similarly,  a  set  of  zero  time  media  composition  dependent  peaks  was  found 
using  the  t-test  to  compare  high  and  zero  virus  at  zero  time.  Three  new  sets 
of  variables  were  generated  from  the  initial  42  variables  selected  using  the 
t-test  comparison  between  zero  and  24  hours.  For  each  set  of  variables  the 
discriminant  analysis  program  was  used  to  develop  a  classification  function 
for  five  groups  consisting  of  the  following: 


group 

1: 

all  zero  hour  samples 

group 

2: 

high 

virus. 

6  hrs 

group 

3: 

high 

virus. 

24  hrs 

group 

4: 

zero 

virus. 

6  hrs 

group 

5: 

zero 

virus. 

24  hours. 

A  set  of  27  masses  was  selected  from  the  above  42  by  removing  15  masses 
that  also  appeared  in  the  zero  hour  test  group. 

The  results  are  summarized  in  the  canonical  variable  plots  of  the  group 
average  locations  in  Figures  36  through  38.  The  separation  obtained  with  27 
variables  (Figure  36)  was  increased  when  two  additional  zero  time  dependent 
masses  were  removed  as  shown  in  Figure  37.  Reducing  these  25  variable  to  21 
by  eliminating  4  additional  peaks  in  common  with  the  zero  time  difference  lead 
to  reduced  separation  as  shown  in  Figure  38.  Thus  it  is  not  possible  to 
eliminate  all  zero  time  artifacts  without  losing  some  of  the  time  and  virus 
specific  information  contained  in  the  variables  removed  in  the  third  test. 

This  compromise,  however,  will  hopefully  not  exist  in  future  experiments 
designed  to  avoid  initial  differences  in  media  composition. 

These  preliminary  results  on  the  "metabolic  profile"  of  tissue  culture 
media  indicates  that  the  virus  infection  can  be  detected  in  vitro  6  hours 
following  exposure.  We  have  continued  our  in  vitro  study  of  tissue  cultures 
by  first  improving  on  the  sample  handling  procedure  (in  addition  to  removal  of 
the  zero  time  artifact  by  UV  irradiation  as  described  above). 
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At  this  stage  we  tested  several  different  methods  to  eliminate  protein 
from  the  tissue  culture  media,  including  precipitation  with  70a  perchloric 
acid,  precipitation  with  ethanol,  and  filtration  using  Millipore  immersible-CX 
ultrafilters  (10,000  molecular  weight  cut-off).  Comparing  the  mass  spectra  of 
the  same  media  using  the  different  pre-separation  techniques,  it  was  found 
that  protein  remove  was  not  necessary,  since  the  protein  seems  to  have  little 
effect  on  the  profile  pattern.  In  fact,  the  results  (Figure  39-41)  indicate 
that  certain  metabolites  are  lost  through  co-precipitation  with  the  protein. 
Therefore  it  was  decided  to  analyze  the  ext  series  of  tissue  culture  samples 
without  removing  the  protein.  We  have  also  optimized  the  pH  of  the  mass 
spectrometrical ly  analyzed  samples  to  pH  2.2  where  minimal  variances  due  to 
imprecise  pH  adjustment  do  occur. 

In  our  next  study  virus  infected  cultures  were  prepared  by  adding  0.1  ml 

3 

of  medium  containing  a  tissue  culture  infective  dose  (TCID^g)  of  10  or 
10^  virus  particles/cell  of  Mahoney  strain  of  polio  virus.  After  allowing 
one  hour  for  virus  adsorption  to  cells,  the  infected  cultures  were  washed  with 
isotonic  phosphate  buffer  (5  mmol/L)  to  remove  excess  virus  particles  and  the 
tubes  were  refilled  with  fresh  medium.  Control  samples  were  pre-incubated 
with  a  UV  inactivated  aliquot  of  the  virus  culture  used  for  infected  samples. 

A  cell  line  of  human  embryonic  lung  tissue  was  cultured  in  Eagle's  minimal 
essential  growth  medium  supplemented  with  5a  newborn  calf  serum.  Cultures 
with  equal  numbers  of  cells,  prepared  in  5  ml  plastic  culture  tubes,  were 
incubated  at  37^C.  Three  tubes  each  were  prepared  for  analysis  at  zero,  6, 

12,  24,  and  48  hours  for  both  infected  and  control  cultures.  Cell  cultures 
were  innoculated  with  a  solution  consisting  of  0,  10,  or  100a  of  the  growth 
medium  containing  active  virus  particles.  At  the  selected  incubation  times 
tubes  were  refrigerated  (4°C)  and  then  centrifuged  (1000  x  g)  to  obtain  the 
cell-free  medium  for  analysis. 
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The  one  ml  thawed  samples  were  treated  with  HgClg  to  achieve  a  final 
concentration  of  5  mM,  which  is  sufficient  to  inactivate  any  virus  particles. 
The  pH  was  adjusted  to  2.2  using  NaOH  or  HC1.  Two  separate  50  microliter 
aliquots  were  taken  and  applied  to  strips  of  glass  fiber  filter  paper.  The 
samples  were  dried  and  stored  at  room  temperature  for  periods  of  one  day  or 
less  prior  to  analysis.  Both  field  ionization  and  isobutane  chemical 
ionization  molecular  profiles  were  obtained  from  this  sample  set. 

In  the  case  of  the  field  ionization  data,  the  zero  hour  samples  were 
compared  in  all  possible  pairs  for  the  control,  low  and  high  virus  levels 
using  the  t-test  for  variable  selection.  Using  a  p-value  threshold  of  0.01 
these  tests  gave  between  6  and  16  m/e's  out  of  300  indicating  a  slight  but 
significant  difference  between  the  zero  hour  patterns.  This  difference  does 
not  appear  to  be  due  to  a  photochemical  effect  of  the  UV  inactivated  media 
used  in  the  preincubation  of  two  of  the  cultures.  A  photochemical  effect 
should  be  most  apparent  when  comparing  the  control  (100  percent  UV  treatedO 
and  high  virus  (0  percent  UV)  samples.  A  similar  pattern  difference  should  be 
apparent  when  comparing  the  low  virus  (90  percent  UV)  to  the  high  virus  (0 
percent  UV).  The  first  of  the  above  t-tests  gave  16  m/e's,  the  second  pair 
gave  only  6,  and  the  third  pair  (control-100  percent  . UV  vs  low  virus  -  90 
percent  UV)  gave  13  m/e's  with  p  <_0.01  in  each  test.  The  small  number  of 
masses  found  in  the  second  t-test  and  the  fact  that  there  was  only  1  mass  in 
common  with  the  first  test  is  inconsistent  with  a  photochemical  artifact.  The 
three  t-tests  described  gave  28  different  m/e  values  with  only  7  m/e's  in 
common  among  the  various  pairs.  Further  studies  will  attempt  to  locate 

possible  systematic  errors  in  the  experimental  procedure  that  produce  the 

>  $ 

observed  zero  hour  differences.  The  zero  hour  differences  that  were  observed 

in  this  experiment  do  not,  however,  have  any  common  features  with  the  more 
important  metabolic  or  virus  related  pattern  differences  studied  at  other 
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incubation  times.  The  possibility  that  there  is  an  actual  biological  virus 
effect  expressed  during  the  pre-incubation  period  cannot  be  excluded  at  this 
point  and  it  will  be  carefully  examined  in  future  experiments. 

To  select  pattern  changes  due  to  the  normal  cell  growth,  a  t-test 

comparing  control  samples  at  zero  and  48  hours  was  used.  Using  25  m/e's  eight 

different  discriminant  analyses  were  performed.  Two  representative 

classifications  are  shown  in  Figures  42  and  43.  First  the  time  sequence 

3  4 

samples  were  classified  for  each  of  the  three  virus  levels  0,  10  or  10 
TCID  using  the  zero  and  48  hour  samples  to  derive  the  classification 
function.  These  results  are  shown  in  Figure  44.  The  figure  gives  the 
coordinates  of  the  group  averages  for  the  most  significant  canonical  variable 
derived  from  the  classification  function.  The  control  samples  show  the 
largest  range  of  values,  consistent  with  the  fact  that  the  zero  and  48  hour 
control  sets  were  used  for  the  initial  variable  selection. 

The  classification  of  the  control  group  (A)  thus  reflects  the  progressive 
appearance  of  metabolic  changes  in  the  growth  medium  as  a  function  of 
incubation  time.  The  low  virus  samples  (C)  show  a  similar  trend,  although 
with  a  greatly  reduced  range  of  coordinate  values  and  an  apparent  sequence 
reversal  of  the  6  and  12  hour  samples.  The  high  virus  classification  shows  a 
slightly  distorted  sequence  and  an  even  smaller  range  of  values.  The  last  two 
classifications  also  show  increasing  overlap  of  samples  between  groups.  This 
result  is  consistent  with  our  previous  experiments  and  shows  a  progressive 
pattern  change  due  to  cell  growth  that  is  inhibited  by  virus  infection.  The 
above  sequence  of  classification  also  shows  that  the  inhibition  increases  with 
the  dose  level  of  the  virus. 


FIG.  42:  Discriminant  analysis  classification  of  control  tissue  cultures  versus 

incubation  time.  Numerals  indicate  group  average  locations:  1  =  zero  hr; 
2=6  hr;  3  =  12  hr;  4  =  24  hr  and  5  =  48  hr  samples,  m/e  values  selected 
from  48  hr  change  in  control  samples. 


FIGURE  44 


Classification  coordinates  of  group  averages  for  discriminant  analyses  of 
experimental  croups  vs  time.  (The  first  row  is  the  same  classification  shown 
in  Figure  42).  m/e  values  selected  from  48  hr  change  in  control  samples. 


Incubation  Time  (hrs) 

0 

6 

12 

24 

48 

Control  Group,  0  TCID,  A 

2147 

1153 

872 

260 

-2000 

Low  Virus;  103  TCID,  C 

15? 

72 

144 

-25 

-136 

High  Virus;  10^  TCID,  B 

11 

-23 

0.13 

-10 

-  12 

Next,  five  discriminant  analyses  were  performed  to  classify  the  three 
virus  levels  at  each  individual  incubation  time.  These  results  are  shown  in 
Figure  45.  The  table  gives  the  distances  between  the  group  averages  and  the 
average  or  worst  case  spread  out  for  the  groups.  The  worst  case  value  is 
given  for  groups  having  much  larger  variance  with  the  subscript  denoting  the 
group  with  the  largest  spread.  In  this  analysis  the  zero  hour  separations  are 
smaller  than  those  obtained  for  any  other  incubation  time  indicating  again 
that  the  major  experimental  artifacts  encountered  in  our  first  experiment  have 
been  largely  removed  by  using  UV  inactivated  cultures  as  the  dilutant  for  all 
samples.  Since  the  variables  initially  selected  for  this  classification 
reflect  primarily  the  normal  metabolic  processes  the  separation  observed  can 
be  attributed  to  a  differential  inhibtition  of  these  processes  by  the  two 
virus  levels.  The  first  row  in  the  table  suggests  that  for  the  highest  virus 
level  the  maximum  pattern  difference  occurs  at  12  hours.  With  the  lower  virus 
level  the  maximum  signal  is  delayed  to  24  hours.  The  third  row  showing  the 
difference  between  the  two  virus  levels  is  also  maximized  at  12  hours.  In 
interpreting  the  data  in  this  table  it  is  important  to  note  that  each  column 
represents  a  separate  discriminant  analysis  so  that  distances  should  not  be 
compared  along  a  row  without  also  relating  them  to  the  othpr  figures  in  their 
respective  columns. 

Another  set  of  25  variables  were  selected  using  the  t-test  to  compare  the 
samples  infected  with  the  highest  virus  level  at  zero  and  48  hours.  Using 
these  variables,  discriminant  analyses  were  performed  as  above  to  classify  the 
three  different  time  sequences  and  to  classify  the  three  virus  levels  at  each 


incubation  time. 


FIGURE  45 


Distance  between  group  averages  from  discriminant  analysis  classification  of 
different  virus  levels  at  various  incubation  times,  m'e  values  selected  from 


48  hours 

changes  in 

control 

group  (A). 

Incubation 

Time  (hrs) 

Grou  pa 

0 

6  C 

12 

24 

48 

A -8 

11 

152 

1140 

630 

90 

A-C 

23 

150 

180* 

567 

45* 

P-C 

25 

25 

1320 

66 

44* 

sb 

3 

2 

160 

2 

7CV 

a)  Group  designations:  A  =  control;  0  TG.ID  virus  level;  R  =  high  lO^TGID 
virus  level;  C  =  low,  104  TCID  virus  level. 

b)  Average  within  group  spread;  subscript  indicates  worst  case  single  croup 
spread. 

c)  This  row  is  the  same  classification  shown  in  Figure  43. 

*  Indicates  incomplete  separation  between  groups. 
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There  was  only  one  m/e  vaue  in  common  between  peaks  selected  by  this  t 
test  and  by  previous  t  test.  This  drastic  change  may  indicate  either  that  the 
virus  infected  cells  have  a  significantly  altered  metabolic  profile  or  that 
the  normal  cells  have  an  altered  metabolism  at  48  hours  compared  with  ealier 
times  due  to  self  inhibition  from  excreted  products  or  depletion  of  nutrients. 

For  this  set  of  masses  the  time  sequence  classifications  became  more 
confused,  with  considerable  overlap  between  groups  throughout  the  incubation 
series.  This  is  not  surprising  in  view  of  the  lack  of  overlap  between  this 
set  of  variables  and  the  previous  one,  presumably  reflecting  normal  metabolic 
changes. 

The  classification  of  different  virus  levels  at  each  time  is  considerably 
improved  using  this  new  set  of  variables  as  seen  in  Figure  46.  The  first 
feature  to  note  is  that  the  variance  within  each  group  is  consistently  small 
with  the  exception  of  the  zero  hour  classification  where  the  results  indicate 
the  expected  overlap  of  all  three  groups.  As  before  the  separation  between 
different  virus  levels  is  strongest  at  earlier  times  and  is  maximized  at  6  to 
12  hours. 

This  analysis  was  extended  to  include  t  test  of  seven  additional  group 
sets,  using  different  times  of  incubation  for  zero  and  high  virus  levels. 
Figure  47  summarizes  the  results  of  discriminant  analysis  separation  of 
different  virus  levels  at  each  incubation  time  using  variables  from  eight 
different  t-tests.  All  t-tests  yield  variables  able  to  separate  samples  from 
a  given  time  according  to  virus  level.  Variables  selected  at  six  hours 
generally  do  not  lead  to  as  large  a  separation  as  other  time  points. 

Using  each  of  these  different  sets  of  variables  a  significant  group 
separation  was  found  in  each  case  between  the  "zero"  time  sets  at  different 

virus  levels.  These  results  indicate  a  real  difference  between  these  sets, 
suggesting  that  measurable  metabolic  differences  occur  already  during  the  one 


FIGURE  A6 


Distances  between  group  averages  from  discriminant  analysis  classification  of 
different  virus  levels  at  various  incubation  times,  m/e  values  selected  from 
48  hour  changes  in  high  virus  group  (P). 


Incubation  time  (hrs) 

Grou  psa 

0 

6 

12 

24 

48 

A-B 

100 

265 

177 

12 

56 

A-C 

59* 

103 

52 

31 

52 

B-C 

41* 

162 

228 

2D 

77 

sb 

110c 

3 

3 

3 

4 

a)  Group  designations:  A  =  control;  OTCID  virus  level;  B  =  high  10*  TCID 
virus  level;  C  =  low,  10^  TCID  virus  level. 

b)  Average  within  group  spread;  subscript  indicates  worst  case  single  group 
spread. 

*  Indicates  incomplete  separation  hetween  croups. 


FIGURE  47 


Summary  of  separation  data  obtained  from  eight  different  variable  sets 
used  to  separate  three  virus  levels  at  each  incubation  time.  Table  entries 
are  computed  from  distances  between  group  averages  obtained  by  discriminant 
analysis. 

(Separation  entry  =  log  [sum  of  between  group  distances]) 

average  group  spread 
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hour  "preincubation  period". 

An  examination  of  the  masses  selected  by  the  four  control  group  t-tests 
showed  only  3  masses  in  common,  using  the  m/e's  with  the  25  lowest  p-values 
from  each  test.  For  m/e's  with  p  £  0.05  in  all  tests  there  were  13  common 
variables.  This  rather  low  number  of  common  variables  means  that  different 
metabolic  processes  differentiate  best  between  infected  and  non-infected  cells 
and  between  cells  infected  at  different  levels  of  viral  infection,  at 
different  times  of  incubation.  It  is,  therefore,  difficult  to  distinguish 
between  metabolites  primarily  associated  with  the  viral  infection,  and  those 
primarily  characteristic  of  normal  cell  turnover,  which  is  inhibited  by  the 
infection. 

The  low  degree  of  overlap  between  the  variables  selected  by  various 
t-tests  led  us  to  apply  an  alternate  variable  selection  for  this  type  of 
multi-parameter  experiment.  The  BMDP  program  P2V  was  used  for  analysis  of 
variance  of  m/e  intensity  measurements,  using  time  and  virus  level  as  the 
grouping  factors  and  examining  the  individual  m/e's  as  dependent  variables. 

The  virus  level  group  consisted  of  high  and  control  (non-viable)  virus  levels 
and  the  time  group  consisted  of  0,  6,  12,  24,  and  48  hour  incubations.  This 
program  analyzes  the  dependent  variables  for  effects  due  to  either  virus 
level,  time,  or  the  combination  of  both.  Using  this  program  33  m/e's  were 
selected  with  a  level  factor  significant  at  p  £  0.05.  Figure  48  shows  the 
separation  between  virus  levels  at  different  times  using  these  variables  for 
discriminant  analysis.  A  similar  degree  of  separation  to  that  obtained  with 
the  t-test  selection  was  obtained. 

A  more  rigorous  test  of  the  use  of  these  33  variables  was  obtained  by 
using  them  to  derive  a  classification  for  both  time  and  virus  level 
simultaneously.  Figure  49  shows  the  results  of  a  discriminant  analysis  of 
zero,  6,  and  12  hour  samples  of  high  and  zero  virus  levels,  demonstrating 


FIGURE  48 


Distance  between  group  averages  for  discriminant  analyses  of  experimental 
groups  vs  time.  Variables  (m/e  values)  selected  by  analysis  of  variance  with 
grouping  factors  time  and  virus  levels. 
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a)  Group  designations:  A  =  control,  (0  TCID)  virus  level;  B  =  high,  (10 
TCID)  virus  level;  C  =  low  (10^  TCID)  virus  level. 

b)  Average  within  group  spread. 
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complete  separation  of  all  six  groups.  Whereas  the  infected  samples  at  6,  12, 
and  24  hours  still  separated  well,  there  was  a  tendency  of  all  three  zero  hour 
groups  to  form  a  cluster  which  partly  overlapped  the  clusters  of  the  infected 
groups  at  6,  12,  and  24  hours.  Also  the  separation  of  the  48  hour  infected 
samples  from  the  other  groups  was  smaller  than  that  of  samples  with  shorter 
times  of  incubation. 

These  findings  corroborate  our  previous  suggestion  that  significant 
metabolic  changes  due  to  viral  infection  of  tissue  cultured  cells  can  be 
monitored  in  the  first  12  hours  and  possibly  as  soon  as  in  the  first  hour  or 
two.  Furthermore,  there  seems  to  be  no  gain  in  diagnostic  information  by 
incubation  for  24  or  48  hours.  Also  it  seems  that  lower  levels  of  infection 
than  those  used  in  these  preliminary  experiments  could  be  detected  within  the 
same  time  frame. 

The  chemical  ionization  patterns  for  zero  and  48  hour  cultures  were 
analyzed  using  the  t-test  and  discriminant  analysis  with  similar  results  as 
obtained  from  the  FI  data  for  the  same  samples.  Other  Cl  samples  in  this 
experimental  series  (6,  12,  24  hour  samples,  analyzed  on  different  days) 
showed  a  drastic  drop  in  the  source  sensitivity,  making  these  patterns  totally 
different  from  the  other  samples.  Additional  experiments  will  be  undertaken 
to  establish  the  proper  control  of  Cl  parameters  for  reproducible  profile 
analyses. 

In  the  latest  study,  Mahoney  strain  of  Polio  virus  was  cultured  in  B6M 

(Buffalo  Green  Monkey)  cells  as  follows:  a  stock  solution  of  the  virus  was 

2 

diluted  a  thousand  fold.  One  ml  was  added  to  a  75  cm  flask  containing  a 
cofluent  monolayer  of  BGM  cells.  The  virus  was  absorbed  for  an  hour.  The 
cells  were  fed  with  10  ml  of  Eagle's  Minimum  Essential  Medium  (MEM) 
supplemented  in  2  percent  Bovine  calf  serum  and  pH  adjusted  using  sodium 


bicarbonate. 


The  cells  were  allowed  to  incubate  at  37  C  for  48  hours,  until 
microscopic  examination  showed  that  90  percent  or  more  of  the  monolayer 
appeared  to  be  infected.  The  cells  remaining  attached  to  the  flask  were 
scraped  off  and  along  with  the  media  were  transferred  to  a  tube  and 
centrifuged  at  2000  rpm  for  ten  minutes  at  4°C.  The  supernate  was  collected 
and  divided  into  2  ml  aliquots  and  snap  frozen  at  -70°C. 

One  aliquot  was  used  to  titer  the  virus.  This  was  done  by  serially 

diluting  the  virus  pool  ten-fold  until  10“^  dilution  of  the  original  pool 

was  achieved  using  1.8  ml  of  MEM  and  0.2  ml  of  virus.  Each  dilution  was 

innoculated  in  triplicate  using  0.1  ml  of  virus  per  tube  of  GM  cells.  The 

tubes  were  innoculated  as  the  flask  was  and  were  observed  over  a  72  hour 

period.  At  that  time  the  tissue  culture  infective  dose  (TCID50)  was 

7  5 

determined  to  be  10  *  per  ml. 

Two  2  ml  aliquots  of  the  virus  pool  were  placed  in  two  shallow  petri 
dishes  and  exposed  to  U.V.  light  for  a  time  of  10  and  30  minutes, 
respectively,  at  a  distance  of  12.5  cm.  The  virus  was  then  innoculated  into 
the  BGM  cells  and  observed  over  a  72  hour  period  for  cytopathic  effect  (CPE). 
No  CPE  was  observed  in  either  the  10  or  30  minute  innoculated  tubes.  Ten 
minutes  of  U.V.  irradiation  appeared  to  be  a  sufficient  exposure  time  to 
inactivate  the  polio  virus. 

Two  ml  of  the  polio  virus  pool  were  diluted  by  one-tenth  with  MEM.  Three 
ml  of  this  were  set  aside  and  the  remaining  17  ml  were  U.V.  irradiated  as 
described  above.  Three  serial  ten  fold  dilutions  of  the  tenth  diluted  virus 
pool  were  done  using  the  irradiated  pool  as  the  diluent.  This  provided  three 
virus  pools:  (1)  U.V.  inactivated,  (2)  high  titer  (approximately 
106-5TIDt)0,  with  a  multiplicity  of  infection  (M0I)  of  10),  (3)  low  titer 
(approximately  10^ - 5jI05q,  with  a  M0I  of  0.1).  There  were  approximately 
3.5  x  105  cells/tube.  The  growth  media  was  removed  from  45  tubes  and 


replaced  with  0.1  ml  of  virus  (that  is,  15  tubes  per  virus  pool). 

Three  tubes  from  each  virus  pool  were  fed  1  ml  of  MEM  and  immediately 
centrifuged  at  2000  rpm  for  5  minutes  and  the  supernates  were  collected. 

These  were  considered  "time  zero".  The  rest  of  the  tubes  were  placed  in  an 
incubator  at  37°C  for  1  hour  to  allow  all  .1  ml  of  virus  to  adsorb.  These 
tubes  were  then  fed  1  ml  of  MEM.  Three  tubes  from  each  virus  pool  were 
immediately  centrifuged  and  the  supernates  were  collected  and  labelled  Tl. 

The  rest  of  the  tubes  were  incubated  at  37°C  centrifuged,  and  the  supernates 
collected  2,  4  and  6  hours  after  innoculation.  Tubes  from  the  4  and  6  hours 
were  observed  for  any  CPE.  After  4  hours  no  CPE  was  observed  in  either  the 
U.V.  inactivated  or  the  low  titer  tubes.  However,  in  the  high  titer  tubes 
less  than  10  percent  of  the  cells  showed  rounding  or  swelling  which  might  be 
described  as  CPE  due  to  the  polio  virus.  After  6  hours  there  were  no  changes 
seen  in  any  of  the  tubes. 

The  tubes  were  frozen  at  -80°C  until  analysis.  The  one  ml  thawed 
samples  were  tested  with  20  microliters  of  0.25  M  HgCL^  to  achieve  a  final 
concentration  of  5  mM,  which  is  sufficient  to  inactivate  any  virus  particles. 
The  pH  was  adjusted  to  2.2  using  NaOH  or  HC1.  Two  separate  40  microliter 
aliquots  were  taken  and  applied  to  strips  of  glass  fiber  filter  paper.  The 
samples  were  dried  and  stored  at  room  temperature  for  periods  of  one  day  or 
less  prior  to  analysis. 

As  in  the  previous  study,  there  were  triplicate  tubes  that  were  run  in 

duplicate,  thereby  giving  six  samples  for  each  virus  concentration  and  each 

time  period.  Each  of  these  individual  groups  of  six  samples  were  analyzed  for 

their  respective  reproducibility.  In  this  study,  the  average  coefficient  of 

variation  for  each  of  the  individual  groups  was  approximately  10  percent  worse 

than  in  the  previous  study  (25  percent  vs.  15  percent  over  the  mass  ranger  of 
61  to  361  amu;  16  percent  vs.  11  percent  over  the  mass  range  of  100  to  250 


amu.).  An  effort  is  now  underway  to  determine  the  reason  for  this  difference. 

One  of  the  most  striking  aspects  of  this  new  study  was  the  difference  in 
the  number  of  masses  with  p  <  0.05  in  the  control  group  using  the  t-test  of 
zero  hour  versus  six  hours  (see  Table  1).  In  the  previous  study  there  were  81 
masses  with  p  <  0.05  in  a  mass  range  of  61  to  361  compared  with  just  21  masses 
in  the  latest  study.  In  the  case  of  the  high  virus  levels,  the  number  of 
masses  were  similar  in  both  cases,  but  again  in  the  latest  effort  there  were 
fewer  masses  (26  vs.  29).  Another  disturbing  feature  was  the  fact  that  there 
was  still  a  "zero  hour  effect",  but  it  was  smaller  this  time. 

Many  parameters  have  changed  since  the  previous  study  so  it  is  difficult 
to  compare  experiments.  It  was  indeed  unfortunate  that  the  original  cell 
line  (human  embryonic  lung  tissue)  died  and  BGM  (Buffalo  Green  Monkey)  cells 
were  substituted.  Certainly,  the  type  and  rate  of  metabolic  changes  would  be 
different  for  various  cell  lines.  This  could  partially  explain  why  there  was 
such  a  difference  between  the  control  samples  of  each  study.  Clearly,  there 
was  less  activity  in  the  control  group  this  time  -  possibly  attributable  to 
the  slower  rate  of  growth  of  the  BGM  cell  (six  hours  is  not  enough  time)  or 
the  UV  inactivated  virus  did  effect  cell  metabolism.  The  higher  coefficient 
of  variation  would  undoubtedly  have  decreased  the  number  of  masses  with  low 
p-values. 

Overall,  the  results  of  the  latest  study  were  somewhat  disappointing.  The 
trends  seen  in  the  earlier  studies  were  less  distinctive  in  this  recent 
effort.  It  is  obvious  that  the  higher  coefficient  of  variation  contributed  to 
this  result.  Analysis  of  the  pooled  freeze-dried  urine  that  is  used  by  this 
laboratory  as  a  monitor  of  pattern  reproducibi 1 ity  indicated  that  the  average 
coefficient  of  variation  due  to  the  mass  spectrometer  had  not  changed  from  the 
previous  studies.  Therefore,  it  would  seem  that  the  tissue  culture 
experiment!  technir  ies  have  to  be  improved.  A  possible  way  to  monitor  this 


methodology  would  be  to  make  each  cell  culture  as  its  own  control.  Instead  of 
45  tubes  containing  cell  cultures,  9  tubes  would  be  used  -  three  with  UV 
inactivated  virus,  three  with  low  titer  virus,  and  three  with  high  virus.  All 
tubes  would  be  sampled  immediately  after  the  addition  of  virus  and  then 
incubated.  Aliquots  of  the  defined  media  from  each  tube  would  be  taken  at 
designated  times.  A  careful  study  of  each  tube  would  be  recorded  using  the 
standard  tissue  culture  techniques  -  that  is  the  cell  growth  for  the  UV 
inactivated  virus  cultures  would  be  measured  and  the  CPE  effects  for  the  virus 
infected  cultures  would  be  noted.  With  such  short  time  periods,  it  obviously 
would  be  important  to  verify  that  indeed  cell  growth  had  taken  place  in  the 
uninfected  cultures  and  that  the  virus  had  infected  the  infected  cells.  All 
of  our  virus  studies  to  date  have  indicated  that  virus  infection  can  be 
detected  in  vitro  within  a  few  hours  after  exposure.  This  is  consistent  with 
tracer  and  other  virological  experiments  which  have  shown  that  cells 
innoculated  with  poliovirus  experienced  pathological  changes  within  hours 


after  infection. 
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Diagnosis  of  Viral  Infections  by  Multicomponent  Mass  Spectrometric 
Analysis 

R.  Abbott,  M.  Anbar,  H.  Faden,  J.  McReynolds,  W.  Rieth,  M.  Scanlon,  L.  Verkh,  and  B.  Wolff 


Metabolic  profiles  of  urine  extracts  of  humans  with  viral 
infections,  as  well  as  of  media  of  virus-infected  human 
tissue  cultures,  have  been  analyzed  by  non-fragmenting 
mass  spectrometry  and  compared  with  corresponding 
controls.  The  spectra  were  then  subjected  to  several  al¬ 
ternative  computerized  statistical  procedures  to  detect 
diagnostic  biochemical  profiles.  Controlled  longitudinal 
studies  on  fully  informed,  consenting  volunteers  who  re¬ 
ceived  sandfly  fever  virus  demonstrate  the  onset  of  a 
characteristic  metabolic  pattern  that  precedes  the  onset 
of  symptoms  and  subsides  when  the  patients  overcome 
the  infection.  Longitudinal  studies  of  human  tissue  cultures 
infected  with  poliomyelitis  virus  demonstrate  characteristic 
metabolic  patterns  within  a  few  hours  after  infection. 
Non-fragmenting  mass  spectrometry  may  thus  provide  the 
clinical  laboratory  with  a  sensitive,  reliable  test  for  viral 
infections  significantly  faster  than  attainable  by  current 
techniques. 

Additional  Keyphrases:  metabolic  profiling  •  urine  • 
viruses  •  sandfly  fever  •  poliomyelitis  •  data  pro¬ 
cessing 

Diagnostic  metabolic  profiling  of  biological  fluids  as  de¬ 
scribed  by  other  workers  using  a  variety  of  techniques  (1-5) 
can  also  be  obtained  by  non-fragmenting  mass  spectrometry 
<6‘).  The  advantage  of  non-fragmenting  mass  spectrometry 
is  its  ability  to  quantitate  hundreds  of  constituents  in  a  bio¬ 
logical  sample  within  a  relatively  short  time  (less  than  1  h), 
expressing  abundances  in  a  digital  form  ready  for  on-line 
computer  analysis.  No  other  analytical  technique  matches 
these  capabilities.  Multicomponent  metabolic  profiles  lend 
themselves  to  statistical  pattern-recognition  analysis  in  which 
a  characteristic  pattern  of  a  large  number  of  constituents  is 
used  to  classify  the  biological  samples  and  thereby  to  diagnose 
pathological  states. 

A  previous  report  (6)  described  the  use  of  field-ionization 
mass  spectrometry  for  the  multicomponent  analysis  of  urine 
and  the  identification  of  patients  with  an  acute  liver  disorder 
(infectious  hepatitis).  The  metabolic  aberrations  associated 
with  liver  dysfunction  are  extensive,  and  the  characteristic 
changes  in  urine  composition  that  occur  could  be  recognized 
by  several  alternative  clinical  chemistry  techniques.  The  po¬ 
tential  diagnostic  usefulness  of  mass-spectrometric  metabolic 
profile  analysis  thus  awaited  a  more  challenging  problem. 

We  have  since  improved  our  field-ionization  mass-spec¬ 
trometric  instrumentation  (7),  and  have  compared  it  with 
chemical-ionization  mass  spectrometry.  We  improved  con¬ 
siderably  the  sample  preseparation  treat  ment,  and  have  also 
explored  the  possibility  of  minimizing  these  procedures, 
limiting  them  to  the  enzymic  removal  of  urea.  We  thus  com¬ 
pared  the  diagnostic  information  obtainable  from  the  major 
constituents  in  urine  with  that  obtained  from  a  certain  subset 
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of  separated  metabolites,  which  includes  many  minor  con¬ 
stituents.  We  have  also  tested  and  compared  alternative  sta¬ 
tistical  data-handling  procedures  for  the  diagnostic  classifi¬ 
cation  of  metabolic  profiles.  The  previously  reported  method, 
a  Wilcoxon  test  followed  by  computation  of  a  weighted  non¬ 
correlation  index  for  comparison  of  patterns,  has  been  re¬ 
placed  with  a  set  of  programs  from  the  BMPD  package  (8), 
which  have  greater  flexibility  for  the  treatment  of  multiple 
group  classifications.  These  improvements  in  methodology 
have  resulted  in  a  more  sensitive  and  specific  diagnostic 
technique,  enabling  us  to  identify  characteristic  metabolic 
profiles  in  urines  of  children  with  pneumonia  or  with  virus- 
induced  diarrhea. 

A  more  critical  test  of  the  technique,  however,  was  the 
identification  of  transient  metabolic  changes  induced  in 
human  subjects  after  inoculation  with  sandfly  fever  virus  to 
produce  a  mild,  self-limited  infection.  These  transient  changes 
were  identified  in  a  longitudinal  study  over  a  period  of  one 
month,  each  subject  serving  as  his  own  control.  In  this  paper 
we  also  describe  preliminary  results  of  the  in  vitro  detection 
of  the  presence  of  a  virus  in  a  biological  sample  by  character¬ 
istic  metabolic  profile  changes  in  the  medium  of  an  infected 
tissue  culture.  Our  findings  thus  corroborate  our  preliminary 
suggestion  (6)  that  non-fragmenting  mass  spectrometry  has 
the  potential  of  becoming  a  highly  powerful  diagnostic  tool 
in  the  clinical  laboratory. 

Materials  and  Methods 

Longitudinal  Study  of  Virus  Infection 

Nine  fully  informed,  consenting  volunteers  participated  in 
this  study,  which  was  conducted  by  the  U.S.  Army  Medical 
Research  Institute  for  Infectious  Disease  (USAMRIID). 
During  the  study  subjects  stayed  in  a  hospital  ward  and  re¬ 
ceived  a  similar  diet.  Seven  individuals  received  sandfly  fever 
virus  and  two  individuals  received  placebo  injections.  The 
participants  were  not  told  whether  they  received  the  virus  or 
a  control  injection.  Morning  urine  samples  were  collected  four 
days  before  the  injection,  the  day  of  the  injection,  for  eight 
consecutive  days  thereafter,  and  finally  28  days  after  the  in¬ 
jection.  Samples  were  frozen  without  preservative,  shipped 
to  our  laboratory  packed  in  solid  CO2,  and  kept  frozen  until 
analysis.  The  study  with  volunteers  was  conducted  as  a  por¬ 
tion  of  long-term  investigations  at  USAMRIID  concerned 
with  the  diagnosis,  prevention,  and  treatment  of  infectious 
diseases,  and  was  extensively  reviewed  and  approved  in  ac¬ 
cordance  with  existing  U.S.  Army  Regulations.  The  collection 
of  urine  samples  for  various  clinical  tests  allowed  us  to  acquire 
the  specimens  described  herein. 

Tissue-Culture  Virus  Infection  Study 

A  cel!  line  of  human  embryonic  lung  tissue  was  cultured  in 
Eagle’s  minimal  essential  growth  medium  supplemented  with 
50  mL  of  newborn  calf  serum  per  liter.  Cultures  with  equal 
numbers  of  cells,  prepared  in  5-mL  plastic  culture  tubes,  were 
incubated  at  37  °C.  Three  tubes  each  were  prepared  for 
analysis  at  0,  6,  and  24  h  for  both  infected  and  uninfected 
cultures.  Virus-infected  cultures  were  prepared  by  adding  0.1 
mL  of  medium  containing  a  tissue-culture  infective  dose 
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Table  1.  Stepwise  Discriminant  Analysis  of  Longitudinal  Samples  from  Two  Individuals  Receiving 
Sandfly  Fever  Virus,  Based  on  40  m/z  Values  (Variables) 
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“  Classification  function  based  on  days  -4,  0  (Control)  and  3,  4,  5  (Fever)  used  to  classify  samples  from  all  other  days. 
“  —  indicates  D2  greater  than  104. 


range  1  to  450  were  acquired  and  stored  by  the  data  system. 
The  probe  and  source  were  baked  to  250  °C  for  10  min  be¬ 
tween  each  sample.  After  bakeout,  the  source  temperature  was 
quickly  returned  to  200  °C  by  using  a  copper-tipped  solid 
probe  to  conduct  heat  away,  through  a  heat  pipe.  Because  the 
source  high  voltage  was  turned  off  during  cooling,  and  because 
the  calibration  mix  itself  “washes”  adsorbed  components  out 
of  the  source,  a  new  calibration  was  obtained  for  each 
sample. 

Chemical-Ionization  Mass  Spectrometry 

Chemical-ionization  mass  spectra  were  obtained  with  a 
DuPont  21-491B  mass  spectrometer  (DuPont  Instruments, 
Wilmington,  DE  19898)  equipped  with  a  chemical-ionization 
source;  isobutane  reagent  gas  was  supplied  at  a  source  pressure 
of  0.3  to  0.5  Torr  (40-70  Pa).  Mass  resolution  of  500  and  a  10-s 
cyclic  scan  from  l  to  550  amu  gave  integrated  counts  of  1.5  X 
10s  to  2  X  105  for  1  pg  of  adenine  evaporated  from  the  probe, 
corresponding  to  30  pC/pg  sensitivity.  For  analysis  of 
urease-digested  urine  samples  we  programmed  the  solid  probe 
input  power  to  evaporate  the  entire  sample  within  45  scans 
in  7.5  min.  The  Finnigan-b  cos  Model  2400  data  system  was 
used  also  with  this  instrument  to  acquire  and  store  spectra. 

Data  Processing 

Raw  spectra,  consisting  of  m/z  centroids  and  peak  areas 
acquired  in  single  scans,  were  added  together  by  the  data 
system  to  give  an  integrated  spectrum  of  the  total  number  ol 
ions  detected  at  each  nominal  mass  during  the  evaporation 
of  the  sample.  Added  spectra  were  transmitted  to  a  CYBER 
173  central  computer  for  further  processing  (Control  Data 
Corp.,  Minneapolis,  MN  55440).  Each  spectrum  was  nor¬ 
malized  to  unit  area  by  excluding  from  the  normalization  sum 


all  peaks  greater  than  5%  of  the  total  ion  count  (6).  We  used 
the  normalized  spectra  for  statistical  analysis. 

The  Mr  profiles  of  the  different  sample  groups  previously 
described  have  been  analyzed  by  several  multivariate  statis¬ 
tical  programs  of  the  BMDP  package  (8).  Programs  P7D 
(description  of  groups  with  analysis  of  variance),  P3D  (t-test), 
P2M  (clustering  analysis),  and  P7M  (stepwise  discriminant 
analysis)  were  used  for  different  steps  of  the  statistical  analysis 
of  the  diagnostic  patterns.  These  programs  were  documented 
in  a  manual  published  by  UCLA,  and  only  a  brief  description 
of  the  function  of  each  program  will  be  given  here. 

The  P7D  program  is  used  to  display  the  mean  and  variation 
of  individual  mass  intensities  for  two  or  more  persons  as  a 
function  of  the  sampling  day  in  the  longitudinal  study.  The 
P3D  program  tests  the  null  hypothesis  that  each  of  the  vari¬ 
ables  in  two  tested  groups  belongs  to  the  same  population.  The 
clustering  analysis  program  P2M  calculates  the  n -space  eu¬ 
clidian  distance  between  each  spectrum  (case),  by  using  the 
scaled  normalized  intensity  of  chosen  m/z  values  (variables) 
as  single  coordinates.  From  this  distance  matrix  the  program 
constructs  a  diagram  showing  the  relative  proximity  of  sam¬ 
ples  in  the  space  determined  by  the  selected  variables. 

The  P7M  stepwise  discriminant  analysis  program  con¬ 
structs  an  optimized  classification  function  as  a  linear  com¬ 
bination  of  variables  that  achieves  the  best  separation  of  the 
given  spectra  into  a  specified  number  of  groups.  Specifically, 
the  program  maximizes  the  ratio  of  sum  of  squares  of  the  be- 
tween-group  variance  to  that  of  the  within-group  variance. 
As  each  variable  is  added  into  the  classification  function,  the 
program  recalculates  an  F-statistic  for  all  variables,  adding 
variables  that  exceed  a  specific  “F  to  enter”  or  removing 
variables  that  fall  below  a  specified  “F  to  remove.”  The  mul¬ 
tiple  correlation  coefficient  of  entered  variables  with  previ- 
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Fig.  2.  Isobutane  chemical-ionization  integrated  mass  spectrum 
of  1  mL  of  urease  treated  urine 


by  day  7.  The  observed  pattern  differences  in  morning  urine 
samples  thus  preceded  the  appearance  of  clinical  symptoms 
and  persisted  after  their  disappearance.  The  third  set  of 
samples,  from  a  control  subject,  was  classified  as  healthy 
throughout  by  the  same  classification  procedure.  Samples 
from  the  other  six  participants  in  this  study  were  recently 
obtained  from  USAMRIID,  and  work  is  in  progress  to  obtain 
the  field-ionization  profiles  of  extracted  metabolites  from 
these  samples. 

Because  this  method  requires  considerable  care  and  effort, 
we  are  limited  to  the  analysis  of  40  to  50  samples  per  week; 
consequently,  we  also  tested  a  simplified  profiling  procedure 
consisting  of  analysis  of  the  urease-digested  urine  by  isobutane 
chemical-ionization  mass  spectrometry.  The  sample  size  (1 
^L)  was  empirically  established  as  that  which  would  allow  a 
relatively  rapid  sample  evaporation  while  still  maintaining 
a  large  excess  in  the  isobutane  reagent  ion  intensity.  The 
longitudinal  sample  sets  for  each  individual  were  prepared  as 
described  above  and  analyzed  in  duplicate  in  a  random  se¬ 
quence.  For  each  sample  set  an  aliquot  of  the  corresponding 
urine  from  a  healthy  subject  was  incubated  with  enzyme  at 
the  same  time.  A  sample  of  this  standard  urine  and  a  water 
blank  containing  the  enzyme  and  buffer  were  analyzed  at  the 
beginning  of  each  sample  set,  and  a  replicate  of  the  standard 
sample  was  repeated  at  the  completion  of  each  set  of  subject 
samples. 


Fig.  3.  Field-ionization  integrated  mass  spectrum  of  50  pL  of 
untreated  urine 


Certainly,  we  achieved  one  objective  of  this  method — a  high 
sample  throughput — in  that  the  assay  of  the  entire  set  of 
longitudinal  samples  from  all  nine  participants,  approximately 
200  samples  including  blanks  and  standards,  was  completed 
in  one  week. 

The  patterns  obtained  were  less  informative  than  the  data 
from  the  extracted  samples.  This  is  not  surprising,  because 
the  sample  size  used  limits  the  constituents  examined  to 
species  excreted  at  more  than  1  mg  per  day.  A  representative 
spectrum  (Figure  2)  is  dominated  by  the  pro  tons  ted  molecular 
ions  of  creatinine  (m/z  114)  and  hippuric  acid  (m/z  180).  Di¬ 
rect  examination  of  urine  by  field  ionization  of  50-75  pL 
samples  (Figure  3)  gives  an  almost  identical  pattern  except 
for  the  appearance  of  some  constituents  as  non-protonated 
molecular  ions  (1  amu  lower). 

Analyses  of  the  chemical-ionization  mass-spectrometric 
patterns  by  the  same  programs  applied  to  the  field-ionization 
data  indicated  that  all  samples  are  rather  similar;  no  unam¬ 
biguous  differentiation  of  the  “response”  period  was  obtained. 
A  weak  separation  was,  in  fact,  obtained  with  the  chemical- 
ionization  patterns.  A  t-  test  selection  of  variables  and 
discriminant  analyses  of  a  subset  of  single  replicate  samples 
from  all  infected  subjects  resulted  in  a  classification  function 
that  separated  the  control  days  (—4  and  0)  from  the  fever  days 
(4  and  5).  Classification  of  each  person’s  sample  series  by  this 
classification  gave  8%  false  positives  and  27%  false  negatives. 
This  result  reflects  the  low  signal-to-noise  ratio  of  diagnostic 
information  in  this  completely  nonselective  pattern  of  major 
urinary  constituents,  a  ratio  that  is  not  expected  to  be  sig¬ 
nificantly  affected  by  the  mild  infection  resulting  from  the 
experimental  virus  illness. 

Although  these  findings  demonstrate  the  inadequacy  of  the 
simplified  sample  preparation  procedure  for  the  detection  of 
subtle  metabolic  changes  associated  with  the  infection,  they 
indicate  that,  at  least  in  terms  of  the  major  urinary  constitu¬ 
ents,  no  major  dietary  or  environmental  differences  have  been 
represented  in  this  sample  set.  Thus  the  more  subtle  pattern 
differences  detected  in  the  extracted  samp'es  are  obtained  in 
the  context  of  a  normal,  unperturbed  baseline  pattern. 

The  analytical  and  statistical  procedures  reported  here  have 
considerable  potential  for  clinical  diagnosis.  In  particular  the 
high  analytical  throughput  need  not  be  sacrificed  when  more 
selective  sample  preparation  methods  are  utilized  to  increase 
the  diagnostic  information.  A  certain  amount  of  chemical 
selectivity  without  elaborate  preparation  could  also  be 
achieved  by  the  use  of  other  reagent  gases  (10). 

The  statistical  procedures  we  used  might  be  needed  only 
for  establishing  the  initial  diagnostic  criteria  for  a  particular 
disease.  The  classification  function  could  then  be  utilized  in 
a  minicomputer  data-acquisition  system,  to  rapidly  classify 
unknown  samples. 

Virus-Infected  Tissue  Cultures 

Afr  profiles  of  media  from  cultures  incubated  for  24  h  were 
prepared  at  pH  2,  7,  and  10  to  determine  the  optimum  con¬ 
ditions  for  classification.  Masses  selected  by  the  t-test  pro¬ 
cedure  were  used  as  variables  for  the  P7M  discriminant 
analysis  program.  We  classified  0-  and  24-h  samples  of  in¬ 
fected  and  noninfected  cultures  at  different  pH  values  into 
four  groups,  to  select  the  pH  range  giving  the  best  separation; 
pH  2  was  the  most  effective.  Analyses  of  infected  and  unin¬ 
fected  samples  from  6-h  incubation  at  pH  2  are  shown  in 
Figure  4. 

The  initial  classification  revealed  three  major  trends  in  the 
patterns.  First,  small  differences  in  the  0-h  patterns  suggest 
a  memory  effect,  reflecting  composition  differences  in  the 
media  applied  during  the  virus-absorption  period.  Because 
the  intermediate  wash  is  mild  enough  not  to  remove  adsorbed 
virus  particles,  low  Mr  metabolic  constituents  from  the  in- 
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Two  additional  t -tests  were  used  to  obtain  new  groups  of 
m/z  values.  By  performing  the  t- test  on  0  vs  24  h,  we  obtained 
a  set  of  peaks  that  reflected  both  time-  and  virus-dependent 
pattern  changes.  The  initial  set  of  42  variables  was  finally 
reduced  to  25  by  deleting  17  m/z  values  that  showed  signifi¬ 
cant  differences  in  the  0-h  t  -test.  For  these  25  variables  the 
discriminant  analysis  program  derived  a  classification  func¬ 
tion  to  separate  samples  into  five  groups:  A,  all  0-h  samples; 
B,  6-h  infected  samples;  C,  24-h  infected  samples;  Y,  6-h 
noninfected  samples;  and  Z,  24-h  noninfected  samples  (Figure 
5).  The  probability  of  assignment  of  individual  spectra  to  the 
corresponding  group  is  presented  in  Table  2. 

The  zero  time  difference  observed  in  these  experiments  can 
be  avoided  by  preincubating  control  samples  with  an  ultra- 
violet-radiation-inactivated  aliquot  of  the  virus  culture  used 
for  infected  samples.  A  more  difficult  task  will  be  the  decon¬ 
volution  of  possible  virus-specific  changes  in  patterns  from 
the  unavoidable  cell-growth-dependent  changes.  We  intend 
to  extend  this  feasibility  study  by  further  experiments  at 
additional  time  points  with  different  viruses  and  host  cell  lines 
as  well  as  cultures  with  intermediate  concentrations  of  viruses. 
These  studies  should  clarify  the  potential  value  of  MT  profile 
analysis  for  early  virus  detection  in  tissue  cultures. 

Since  our  last  report  we  have  demonstrated  excellent  results 
in  the  classification  of  additional  liver  disorders  (cirrhosis,  in 
particular),  pneumonia,  viral-induced  diarrhea,  and  urinary 
infections  (unpublished  results);  the  biochemical  aberrations 
in  all  these  diseases  were,  however,  rather  conspicuous,  and 
the  mass-spectrometric  multicomponent  analysis  did  not  offer 
a  unique  diagnostic  solution.  The  results  reported  in  this  paper 
demonstrate  a  unique  capability  of  our  methodology  to 
identify  minute  changes  in  metabolic  profiles  with  a  high 
degree  of  certainty.  Moreover,  these  changes  were  demon¬ 
strable  before  the  onset  of  clinical  symptoms,  and  persisted 
for  some  time  after  the  clinical  symptoms  have  subsided.  The 
diagnostic  implications  of  these  findings  are  self-evident.  As 
indicated  by  preliminary  results,  the  analysis  of  metabolic 
profiles  of  tissue-culture  media  as  a  means  to  detect  and 


possibly  identify  a  viral  infection  much  earlier  than  feasible 
by  the  current  techniques  is  another  highly  promising  appli¬ 
cation  of  multicomponent  analysis  by  non-fragmenting  mass 
spectrometry. 


This  study  was  sponsored  in  part  by  the  U.S.  Army  Medical  Re¬ 
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Facility,  UCLA,  under  NIH  Special  Research  Resources  Grant 
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